WO2009072095A2

WO2009072095A2 - Page indexer

Info

Publication number: WO2009072095A2
Application number: PCT/IB2008/055666
Authority: WO
Inventors: Rashmi Bajaj; Wei Li
Original assignee: France Telecom
Priority date: 2007-12-06
Filing date: 2008-12-03
Publication date: 2009-06-11
Also published as: WO2009072095A3

Abstract

The invention relates to a method to enhance the relevance of search results provided by a search engine server accessible through an operator network, said method comp rising the acts of: - collecting data related to a subscriber to said operator network, - aggregating said data to built a network profile for said subscriber, - receiving a search query from said subscriber addressed to the search engine server, - transmitting the search query to said search engine server through the network, said query being associated to at least part of the subscriber network profile, - receiving search results from the search engine server, said search results being ranked according to said at least part of the subscriber network profile.

Description

PAGE INDEXER

FIELD OFTHE PIΦSENT1NVENTION: The present invention generally relates to search engines, and more specific ally to search engine taking into account the tracking of use is be ha vio is.

BACKGROUND OFTHEPRESENTlNVEKπON:

Today there is an explosion of information accessible through the Internet. H e ntifying sp e c ific information c an so me times be come a realburden.

A plurality of search engines are currently available to help the user find and select hyperlinks to web pages with information relevant to him/her. Generally, a user will provide to a search engine a number of search terms, through a search query, that expresses his/her interests. The search engine will then return a list of links more or less relevant to the user's search query. The list of links, also called hits, is the result of the matching of the search terms with presto red web pages (c o lie c ted e.g. through a web crawler).

By web crawler, one may understand a program or automated script which browses the World Wide Web in a methodical, and automated manner. The p ro c ess is c ailed Web crawling. Crawlers are also known as spiders or bots. Many sites, in particular search engines, use web crawling as a means of providing up-to-date data. Web crawlers are mainly used to create a copy of all the visited pages for later processing and indexing to authorize fast searches.

Some search engines may offer to rank the hits in terms of relevancy to the user. Page indexers are another critical part of search engines that allows the ranking of hits. Since the invention of world wide web, researchers have been trying to leverage all possible information sources, to extract more and more knowledge abouta web page, to build a sue c essful ranking system (ie. an index) upon which a search engine may rely. The Web pages, orWeb documents, that are retrieved by a web crawler are then analyzed by a page indexer. Data co lie c ted from each web page are then added to the index of search engine. When a user sends a query to a search engine site, ie. to the search engine server, the user input is checked against the search engine's index of all the web pages it has analyzed. The best urls are then returned to the useras hits, ranked in o ide r with the b e st re suits at the to p .

Search engines have evolved overtime inoiderto provide more relevant hits to their users. Alta-vista was one of the first generation search engines (from 1995). Only "on page" data and text data were used in this generation. Thus the search was based on Word frequency, language and other structural page information. Otherexamples included Excite and lycos.

The second Generation (from 1998) was based on usage of "off-page" and web-specific data. In this generation, search was refined based on link Analysis, Clickthro ugh data and Anchortext. What results people clicked and how people refer to this page was taken into account. Google, notably through its PageRank® approach, made this kind of search popular and to this day every search engine supports this approach.

Currently, the most popular way of searching and ranking pages today is based on the information provided by Web Content combined with the Web Struc ture .

Web Content may refer to the entire content on the web in generic terms. This could be words (text), pictures, images or sounds. Usually though, the reference to web c o nte nt is fo r the text. Web c ontent therefore in this sense is the 'information' a website provides. Forinstance, we can compare web content like the general index of all keywords (or pages in this case). There is no ranking /priority associated.

Web structure may be seen as the web content with consideration of the hyperlinks associated to it. Thus it covers the relationship (via links) that exists between the various web pages. Thus the 'organization of the web content' i.e. the 'web structure', plays a key role in determining the search results in a search engine since some pages have higher priority than the others. Google's PageRank® allocation is purely determined based on the web structure.

An example of a basic Search Engine Architecture is shown in MG. 1. Crawlers 10 browse information sources 15 and copy all the visited pages in a repository 20. A keyword indexer30 c ollects all the keywords of the pages copied in repository 20 to determine the web content 35. A link analyzer 40 is used to determine the web structure 45. When a usersendsa query to a query engine 50, forexample via a search page displayed on a web browser, the keywords of the queiy are matched with the web content 35. The hits are then ranked through a ranking engine 55 using the content of the web structure 45. An optional snippet extraction 60 (ie. a relevant passage of the selected pages) maybe achieved b e fo re re turning the se a re h re suits to the use r. The upcoming new generation or Third Generation of search engines will focus on understanding and answering the needs behind the user query itself. This leads to semantic analysis to find what the query is about. The goalisthusto focus on the userand his/heractualneeds ratherthan the query itself.

The developments of search engines may be numerous, to this day, known search engine indexers solely rely upon limited user information.

It would be interesting at this point to include the users experiences, ind ivid ua Uy o r c o lie c tive Iy in the ra nking syste m ofa search e ng ine to e nha nc e the re Ie va nc e of hits re tume d to a use r's q ue ry.

SlMMAKT OFTHE FKESENTMEIHOD AND SYSIEVI:

It is an object of the present system to overcome disadvantages and/or make improvements in the prior art.

Tb that extend, the present invention proposes method to index a webpage accessed by a plurality of subscribers of an operator network, said method comprising the acts of:

-collecting data related to said accessed webpage, said data comprising atleastone of three elements, said elements being:

- the subscribers who have ace essed said webpage,

- the Io c ations where from the webpage was ac c essed, and, - the times of ace ess of said webpage,

-calculating a page weight for said accessed webpage based on the c ollec ted data,

- indexing the ac c essed webpage with said c alculated weight.

The present method allows to expand the scope of c alculating "webpage weight", by taking into account the usage data and network profile of users. The usage data may be seen as the raw data resulting from a user who is browsing the web. This information is essentially a collection of user's behavior without any analysis. Some processing and analysis of this usage data is required to make intelligent judgments about users p re fere nc e s. The network pro file maybe seen as the processed output from the analyzed usage data. Once analyzed, the usage data may help the operatorofa network to associate the users usage trends with user Customer Re source Management (CRM) profile and thus understand the user pre fe re nce(s) better in oider to provide personalized rec ommend a tions. From an operator standpoint, the information generated by users browsing the internet is already available forthose users over that operator's network.

This information available at the network level is used in the present method to generate a userbased indexing and furtherrefϊne the search hits from a search engine. This network level information contains all the relevant information that c o mplete Iy defines the usage context. The network profile which is based on time of access, place (geography) of access and CRM data (including demographic information) of the user along with the duration and frequency of specific c ontent ac cessed by the user.

The invention also relates to an indexer engine to index webpages accessed by the sub sc rib e rs to the operator network

In an accordance with alternative embodiment, the present invention is related to a method to enhance the relevance of search results provided by a search engine server accessible through said network, said method comprising the a c ts o f: - collecting data related to a subscriberto said operatornetwork,

- aggregating said data to built a network pro file forsaid subscriber, -receiving a search query from said subscriber addressed to the search engine server,

-transmitting the search query to said search engine server through the network, said query being associated to at least part of the sub sc rib e r ne two rk profile,

-receiving search results from the search engine server, said search results being ranked according to said at least part of the subscriber network pro file.

The profiling data available over the operator network is valuable information that may help enhance the search results from a subscriber query, to better match his/ her interests measured thanks to the profiling information.

EREFDESCiaPπON OFTHEDHVWINGS: The present system and method are explained in further detail, and by way of example, with reference to the accompanying drawings wherein:

FIG.1 shows an exemplary embodiment of a known search engine,

FlG.2 shows an illustration of the operator network architecture according to an exemplary embodiment of the present system,

FlG. 3 shows an illustration of a user based page weight calculation according to the present method,

FlG.4 shows an example ofa page accessfrequencyby a user or from a loc ation or at a given time ofa day, FlG.5 shows an example of a time spent on a page by a user or from a loc ation or at a given time ofa day,

FIG.6 shows an illustration of an access location page weight c alculation according to the present method,

FIG. 7 shows an illustration of an access time page weight calculation according to the present method,

FIG.8 shows an illustration of the page weight calculation ac cording to the present method, and;

FIG .9 sho ws a n illustra tio n of the operatornetworkarc hite c ture a c c o rd ing to an alternative exemplary embodiment of the present system.

DEEUEED DESCE9FΗON OFTHEFKEFEISΦD EMBODIMENTS

The following are descriptions of exemplary embodiments that when taken in c o njunc tio n with the drawings will demonstrate the above noted features and advantages, and introduce furtherones. In the following description, for purposes of explanation rather than limitation, specific details are set forth such as architecture, interfaces, techniques, etc., for illustration. However, itwillbe apparent to those of ordinary skill in the art that other embodiments that depart from these details would stillbe understood to be within the scope of the appended claims. For example, the invention allows an improved indexing of documents taking into account users network profiles, and is described here after in its a p p Hc ation to web pages and web sea re h engines. The man skilled in the art will notice that this is not the sole embodiment possible, and that the system and method according to the invention may be implemented to documents available on one or more databases, accessible through a local network. Other embodiments are re adfly available to the man skilled in the art.

Moreover, for the purpose of clarity, detailed descriptions of well-known devices, systems, and methods are omitted so as not to obscure the description of the present system. In addition, it should be expressly understood that the drawings are included for illustrative purposes and do not represent the scope of the p re se nt sy ste m .

FTG. 2 shows an exemplary embodiment of the present system. An operator's network is illustrated on the upper part of FlG.2 while an applic ation server, hosting a search engine is represented on the lower part. The search engine maybe a distant node from the operator's network or part of it. For the purpose of illustration, a distant node outside the operator's network is c o nsid e re d .

The operator's network may be seen as all the infrastructure in c ontrol of an operator, and that provides to subscribers to said operator communication services (voic e, internet, TV, ...). One of more gateways (not shown in FlG.2) are provided to allow the sub sc rib e rs to ace ess services hosted by distant nodes, like e.g. a search engine node 280, orplace a callto another network.

Thro ugh the o p e rato r's ne two rk, sub sc rib e rs ie . use rs may e njo y se rvic e s like telephony 210, internet ac cess 215 and media ace ess 220 (TV, video, music , ...). In the present system, the operator, through its infrastructure, has access to the usage data generated by subscribers through different paths described hereafter.

One possible path is the use of networks sniffers 225 that are provided to sniff, ie. record, the webpages visited by the users, whether these pages are accessed by their mobile 210, their computer 215 or even through a media devic e 220. A sniffer (also known as a network a nalyzerorpro toe ol analyzer or, for p articular type s of networks, an Flhemet sniffe r o r wire Ie ss sniffer) may be seen as a computer software or c omputer hardware that can intercept and log internet packet traffic passing over a digital network or part of a network. A network sniffer may also be seen as an application that passively records any packet traffic that runs through a given network point. The sniffer will pick up all the IP p a c ke ts fo r e ve ry inte me t p ro to c o L A sniffer may be located at the networkgateway to the world wide web or within the network for network services. An example of a sniffer may be the network analyzers provided by Packeteer® or more generally an intercepting proxy (to inte re e p t a 11 the traffic passing through it). Once a sniffer has recreated the HTTP and HTTPS traffic corresponding to the packet traffic , said sniffer 225 then creates a web log file. Thus, every "hit" to a web page, including each view ofa HIML document, image or other object, may be logged. The "raw" (ie. before p roc essing) web log file format may essentially be one line of text for each hit to a given webpage. This may c ontain (but not limited to) information about the URL of the page, who was visiting that page, what time they accessed the page (a time stamp), where they came from, and what activity they had on that page. For example , the web log file fora given webpage may contain: -the sub sc rib e rs who have accessed said webpage,

- the location where from the webpage was ac cessed, - the time of ace ess of said webpage,

-the duration for which the said webpage wasaccessed.

Various formats exist for web log file, with different content, the detail of whic h is b e yo nd the scope of the present method.

The output from the network sniffers 225 maybe stored in offline operator's logs, respectively the mobile logs 230, the internet logs 240 and the media logs 250. This output may be seen as an offline usage data.

Anotherusage data source is the CRM (Customer Resource Management) databases that are also maintained in the operator's network. CRM databases 235, 245 and 255 are respectively dedicated for telephony, internet and media. They are repositories of data re c o ids relating to each customer or subscriber to the network. By way o f illustra tio n and not as a limitation, a data record within any of the CRM databases may comprise subscription details such as:

- fo r ho w Io ng ha s the c usto me r signe d up fo r se rvic e s,

- what services (data/voice), -what grade of service (basic, medium, best class), i.e. his/her level of subscription,

- numb e r o f line s,

- users payment score,

- history of actions (orders, c omplaints, proposals), - use r p re fe re nc es, and;

- user personal data (e.g. name, age, address, fixed telephone, date of birth, profession, personal email address), also referred to the demographic data of a user he re after. The whole usage data collected (the mobile, internet and media operator's logs 230, 240 and 250 respectively, as well as the mobile, internet and media CRM 235, 245 and 255 respectively) for the subscribers is processed by a profile engine 260 to determine the subscribers network profiles. Known profiling techniques, readily available to the man skilled in the art may be used at this point by the profile engine 260 to aggregate the collected content data into subscribers network profiles. The profiling may be approached for instance from a user point of view or from an accessed (duration, frequency or loc ation) point of view.

In order to process the usage data described here above, the profile engine 260 may comprise a web log analyzer. A web log analyzer is a piece of software that parses a web log file to derive an indie ation on who , when and how a webpage is visited. The analysis may be carried out "off line", ie. after the operator's logs are generated, e.g. on a scheduled basis like when the networks activity is lower. The analysis may also be c arried out "on the go"(asseenin FIG.2 with the directline between the sniffers 225 and the profile engine 260), ie. in real time.

A web log analyzer may e.g. derive from the operator's logs for each webpage :

- the duration of ace ess of said webpage, and, - the frequency of ace ess of the said webpage.

The profile engine willaggregate the information available foran ac cessed webpage with the CRM data from each user who has ac cessed said webpage to generate network pro filing data. As forthe web log analyzer, the profiling maybe performed offline or on the go. The present webpage indexing method relies upon the generated network profile data to associate a webpage weight to each accessed webpage. A page weight from an indexing method may be seen as a measure of how significant this webpage iscompared to otherindexed webpages.

One exemplary embodiment of the present webpage indexing method will be illustrated here after with the use of at least one of:

- the sub scribe is who have accessed a given web page,

- the location where from said web page was ac cessed,

- the time of ace ess of said web page, - the duration of ace ess of said web page, and,

- the frequency of ace ess of the said web page.

A wealth index, derived from the network profile data, is also used and will be defined hereafter.

More specifically a webpage weight 840 is calculated, as can be seen in the exemplary illustration of FTG.8, using up to three components:

-weight of a webpage based on who is accessing it i.e. users who accessed it during certain period of time, called here after user based weight 810,

- weight of a webpage based on where from it is being accessed ie. based on the different locations from where this webpage is being accessed, called here after Io cation based weight 820,

- weight of a webpage based on when it is being ac cessed (i.e. time of the day, and dayofaccess),called here after time based weight 830.

The weights are calculated by the page indexer 265, also called here after an indexer engine , as seen in FlG.2. The page indexer may be a c omputer or server and comprises a processor carrying instructions to execute the present indexing method.

User based webpage weight

This section illustrates an example of how a webpage weight may be calculated based on all the people who ac cessed it. This corresponds to the user based, or usage based, page weight, as illustrated in FlG.3. Here after are the different acts to calculate this userbased weight. FJach weight calculated in acts 310 to 330 is calculated fora given webpage, accessed by Nu users, Nu being an integer greater or equal than 1. As acts 310 to 330 are independent, they may be run c o nse c utive Iy (in a ny o id e r) or simulta ne o usly . h a act 310, the weight of an individual user UWi is c ale ula ted. The user's weight of a webpage takes into account how influential this user may be among a 11 o the ruse is. It is c alculated based on at least one of two core variables:

1. Percentage oftotalpagesaccessed by him/ her e.g. if in France a total 100 billion web pages were ac cessed by a total of 100 million users (for simplification users are defined as a machine or an entity accessing a webpage over the internet using IP), then weight of "A user" will be based on the % of pages he/she accessed out of a total 100 billion. This data re fie c ts ho w a c tive this use r is in te rms o f c o nne c tio n to the inte me t, whe n compared to the o the r use rs in the operator's network,

2. Individual influence or wealth indices UDWi: This is calculated based on the operator owned CRM data mentioned earlier. Financial data from the userare takeninto account to increase the weight of a page. The more this use r is willing to spend, the more interesting the webpage maybe for operators to recommend as this webpage attracts users that help generate higher revenues. The use r i influe nc e may be calculated based on demographic information of said useri In this illustration, it is assumed that the user influence is either a wealth index (e.g. an index giving an indication of a user's revenues, that helps to compare all the sub sc rib e rs to the network between themselves) or user's payment score pulled from operator's CKM database. Different models may be used to define a user's influe nc e . His/ he r influe nc e or we a lth index may be based on the pure hase history of said user, on his/ he r Io c a tio n (city, countryside, nature of the neighborhood, ...), type o f sub sc rip tio n (basic, advance, ...), declared income in the demographic information, age group, ... Forexample, if the customerhas a certain number of purchases (or any kind of related cash flow, such as online lottery, online subscription) over a certain kind of website, or, a c ertain kind of product, this should be recorded and used as an input parameter. The time of the purchase may be considered in the c alculation of the wealth index to give more weight to recent purchase :

Where:

- UDWi is the we a lth ind e x,

-Wix is the cash flow spent by user i over the x^th product he/she bought, the user having bought a total of X objects, with X an integer equal to or greater than 1,

-Wuxisthe average orunified c a sh flo w fo r the webpage,and -Widxisthe time difference between when the purchase happened and the current date forproductx, A constant zx may be used to adjust the weightofthe index. We can see in this model that the more times the customer purchases an object or item, and the more recent the purchase happens, the largerthe wealth index willbe.

The weight of an individual user UWi may be seen as the product of the user weight based on number of pages visited by said user and the user weight based on his/ her individual influence. Now let: UWi => User weight of user i

U=> set of all users under consideration, the set comprising n users, UPWi => User weight based onpeic entage of pages visited by useri NPUi => Number of pages visited by useri

UDWi => Use r influe nc e o r we a lth ind e x. then:

UWi = UPWiχUDWi ⁽D where :

J=n UPWi = NPU/ ∑NPUj x 100 and iej MUl

J=I

UWi could be limited to UPWi o r UDWi if o ne of these data is defined as more important than the other one. In the here above relation, the NPUi parameter is measured overa given period of time (preceding the calculation).

Depending on the length of that period of time, the most recent behavior of useri willbe favored or not.

In a another act 320, the web page weight based on the visit frequency of the same user is estimated. This page weight is calculated based on how many times that page was ace essed by user i This will help define the importance of that page forthat individual

Eg. Fa person visits www.nytimes.com page many times every day then this behavior will be c aptured by this page weight c omponent. An example of a page access frequency by a user is shown in FTG. 4 wherein the number of accesses is plotted as a function of the calendar days, the first visit b e ing on day 1 (Dl) and the ac cesses are registered over a period of n days (Dl to Dn).

This weight is calculated by calculating the weighted average of access frequency forthe same give webpage: n n

KJAF_PW = [ [∑AFkx Wk]/[∑ Wk] ] ⁽²⁾ k=i k=i wherein:

IUAF_PW => page weight based on individual use race ess Frequency, AFK => ace ess frequency of the given page b v a sp e c ific use r o n Day Dk Wk=> Weight of day k n => Tόtalnumberof days

Regarding the c alculation of Wk, the weight of day k may be calculated based directly on the total number of the pages that were viewed on the day k by a 11 subscribers:

where n is a given set days over which the weight UJAF_PW is calculated and Npk implies the number ofpages accessed on day k. The given set of days may be a week, Le., Monday to Sunday. Hence n = 7. The set could be a month, ie. n = 30 (on average) in this case. The here above formula (2) is described in relation to Wk that is the weight of a day. Ih some cases, the same formula can be extended to Wkwhere Wk represents weight of a month k or any other given period of time. When Wk represents a month, n=12 for instance. In this case Wk represents the perc entage of the pages that were viewed in the month k. As Wk is used again lateron, the same remarks may apply here below.

In a third act 330, the page weight based on the time spent by that useriis estimated. Another factor that may influenc e the page weight is based on how much time user i spends on that page. Eg. if an individual spends significant amount of time everyday on www.nytimes.com website, then this behavior will be captured by this 'time spent' c omponent of the page weight. An example of time spentperday on a page by a user is shown in FlG.5. The duration ofaccessis plotted as a function of the c alendar days, the first visit b e ing on day 1 (Dl) and the ace esses are registered overa period ofndays (Dl to Dn).

This weight is determined b y c a Ic ula ting the weighted average of the time sp e nt:

IUTS_PW= [[∑TSkχWk]/[∑k] ] ⁽3⁾ k=i k=i where, mCS_PW => page weight based time spent by a user ISk => Time spentona page by a specific user on Day Dk Wk => Weight of day k n => Total number of days

In a further act 340, a network user page weight is c a Ic ula ted. The weights calculated in the previous acts 310 to 330 are aggregated. In an alternative embodiment of the present method, only some of these weights may be aggregated, depending on how signific ant these weights may be.

PW U = ((IUIS_PWi) + (IUAF_PWi)) / Nu (4)

wherein:

PW_U=> user base PageWeight (based on users who accessed it) UWi => User weight of useric a Ic ula ted in act 310, IUTS_PW => page weight based on individual user access frequency calculated in act 320,

IUAF_PW => page weight based time spent by an user i c alculated in act 330,

Nu => Tb ta 1 numb e r o f use rs

Access location based page weight

This section illustrates an example of how a webpage weight may be calculated based on the location from where the page was accessed. This corresponds to an access location based page weight as illustrated in FlG. 6. Here after are the different acts to calculate this ace ess location based weight.

Each weight calculated in acts 610 to 630 is calculated fora given webpage, accessed by Nu use is from Nl locations, Nu and Nl being an integer greater or equal than 1. As acts 610 to 630 are independent, they may be run c onsecutively (inanyorder) or simulta ne o usly .

In act 610, the weight of location where from the page is being ac c essed, is calculated. It is calculated based on at least one of two core variables: a. Percentage of total pages ac c essed from a loc ation e.g. if in France a total of 100 billion pagesis accessed from all the 10,000 zip codes , then the weight of "A zip code loc ation" will be % ofpages accessed from that location out of a total lOObiUion accessed all over the France. The granularity of the loc ation (ie. the scale of an individual Io cation) may vary depending on how refined the information is needed and available at the network level, b. Iβ c ation's popularity and influence index: this popularity or influenc e index is calculated based on either GDP (Gross Domestic Product) share of that country, or other scale of a set of locations or based on the average monthly income, wealth index or other c ensus/ government parameters. The data is similarto the wealth index defined earlierfora user. This notion is transposed here fora location. The location weight will be calculated based onperc entage oftotalpages visited from a location, where the location is a part of a set of sample locations and also based on the popularity index of said location as well Usually the loc ations c an be identified based on the c o untry, while forsome cases locations could be a set of popular cities, or any other relevant type of locations. An example of location source is Geonames (www.geonames.oig) which contains over eight million geographical names or locations, and consists of 6.3 million unique features whereof 2.2 million populated places and 1.8 million alternate names. AH features are categorized into classes of different scales (county, region, c o untry, ...) and furthe r sub c a te g o rize d into feature codes(size ofthe city, street, road, name of place, lake, forest, park, ...). The locations, including their scale, may be defined at the network level, based on existing locations matched with the infrastructure of the said network. The users may also be regrouped by DSIAM (Dig ita 1 sub sc rib e r line a c c e ss multip Ie xe r) . The illustra tio n he re a fte r will b e developed using the zip code as the selected granularity of the location, but other sc ale of location may be used. Let:

LWm => Lo c atio n we ight o f Io c atio n m

L => set of all locations under consideration, with a total number of Nl Io c ations, LPWm => Location weight based on percentage of pages visited at

Io c a tio n m

NPlm => Numb erofpages visite d a t Io c a tio n m

DWm => Lo c atio n Weight based on popularity index of the location m, The parameter LWm may be calculated as the product of the location weight based on percentage of the pages visited from that location and location weight based on the popularity index:

LWm=LPWmχLlWm (5) where :

Nl

LPWm = NPIm/ ∑NPlm x 100 and m e {L} j=l LWm could be limited to LPWm orDWm if one of these data is defined as more important than the other one. In the here above relation, the NPkn parameter is measured overa given period of time (preceding the calculation).

Depending on the length of that period of time, the most recent behavior of

Io c atio n m will be favored or not.

In another act 620, a page weight is calculated based on the visit frequency of that page from "A zip code location" residents. After calculating the location weight, a page weight is calculated based on "how many times that page was acce sse d fro m a c e ita in Io c a tio n". This will he Ip d e fine the imp o rta nc e of that page fo r that Io c atio n.

Eg. if the residentsofa c ertain location visit www.nytimes.com page many times every day then this behavior will be captured by this page weight component. An example ofa page access frequency from a zip code is shown in FIG. 4 wherein the number of accesses is plotted as a func tio n o f the calendar days, the first visit being on day 1 (Dl) and the accesses are registered over a period of n days (Dl to Dn). This weight is calculated by calculating the weighted average of the a c c e ss fre q ue nc y. n n

IIAF_PW= [[∑AFκχWk]/[∑Wk] ] (6) k=i k=i wherein: EAF_PW = page weightbased on individual Io cations ace ess frequency,

AFk= accessfrequencyofpage at a specific location on Day Dk Wk= Weight of day k n = Total number of days

In another act 630, the page weight based on time spent by all the residents of "a zip code location" is estimated. Once a page weight is calculated based on how many times a zip code residents visit that page in act 610, another factor that may influence the page weight is calculated. This one is based on how much time the zip code's residents spend on that page. Eg. if the residents in a zip code who visited this page, spent significant amount of time everyday on www.nytimes.com website, then this behavior will be captured by this page weight c o mponent. An example of time spent by a given zip code residents per dayona page is shown in FlG.5. The duration of ac cess is p lotted asa function of the calendar days, the first visit being on day 1 (Dl) and the accesses are registered overa period ofn days (Dl to Dn).

This weight is calculated by c alculating the weighted average of the time sp e nt: n n

ICIS_PW = [ [∑TSkχWk]/[∑Wk] ] (7) k=l k=l wherein: mS_PW = page weightbased on individual Io cation spent time,

ISk= Time spentona page ata specific locationon Day Dk,

Wk= Weight of day k n = Tbtalnumberof days

In a further act 640, an access location page weight is calculated, ie. an aggregated page weight based on the locations from where the page was accessed is calculated. The weights calculated in previous acts 610 to 630 are aggregated. In an alternative embodiment of the present method, only some of these weights may be aggregated, depending on how significant these weights may be.

IβtPW L= (EAF-FWm)) /Nl (8)

with:

PW_L= a c cess Io c ation page weight, ie. the page weight ofa given page ace essed at all Io cations considered in the operator's network,

LWm => LD c ation weight of loc ation m c alculated in act 610, ILTS_PW => page weight based on individual loc ations ac cess frequency, calculated in act 620,

EAF_PW => page weight based on individual locations spent time, calculated in act 630.

Nl => Tb tal numb e r o f Io c atio ns

Access time page weight

This section illustrates an example of how a webpage weight may be calculated based on the moment, ie. time of a day, when it was accessed. Following steps are taken to calculate this. This corresponds to an access time based page weight as illustrated in FlG. 7. Here after are the different acts to calculate this access time based weight. Each weight calculated in acts 710 to 730 is calculated fora given webpage. As acts 710 to 730 are independent, they may be run c onsecutively (in any order) or simultaneously.

In act 710, a weight for the "time of the day" for the webpage access is calculated. It is calculated based on the percentage of total pages ac cessed during that time of the day. A day may be divided severalslots of equal durations or not, the dividing depending on the different moment of a day that are to be identified. In the hereafter illustration, the day is divided in three identical 8 hour slots ie. Morning (2am to 10am), afternoon (10am to 6pm) and evening (6pm to 2am) e.g. Other slots may be used, as well as a different unit of time, e.g. a week, a month, a hour, ... instead ofa day. Fin France, if a total 100 billion pages were accessed during an entire day, then the weight of the "morning slot" will be the percentage ofpages accessed during that slot out of total 100 billion accessed pages. let in our example t be the considered time of the day, t being either a precise moment or a time slot itself. TAWt is the weight of this time slot t over the whole day. As three main timeslots are defined, three values for TAW may be calculated:

TAWt = TAWl, if t is between 2AM and 10AM

TAWt = TAW2, if t is b e twe e n 10AM and 6PM TAWt = TAW3, ift is between 6PM and 2AM with: flOam

TAWl= ∑Npt / ∑Npj x 100 (9.1)

U=2am whole day

TAW3 = (9-³)

where Npt = Number of pages accessed at time t. The denominator in all the above three equations implies the total number of pages ace essed in a 24 hourperiod, as the time unit in this example is a day. In the here above example, 3 main timeslots are defined. The equations 9.1 to 9.3 may be generalized easily if the main timeslots are different, e.g. more orlessthan 3 are defined, and/orthe main time slot are o f une q ua 1 d ura tio ns.

Next, in another act 720, a page weight is estimated based on the visit frequency during certain "time of the day", by all users and at all locations. After calculating the weight of a specific timeslot t of the day in act 710, a page weight is calculated based on how many times that page was ace essed during the time slot t.

This will he Ip define the importance of that page during that time t of the day. Rg. ff visitors visit this given page www.nytimes.com many times during a certain part of a day (e.g morning), then this behavior will be captured by this page weight component. An example of the number of ac cesses to a given page during a time slot is shown in ElG. 4. The number of visits is plotted as a func tio n o f the calendar days, the first visit b e ing on day 1 (Dl) and the accesses are registered overa period ofn days (Dl to Dn).

This weight is calculated by c alculating the weighted average of access frequency.

πΑF_PW= [ [∑AFkx Wk]/ [∑ Wk] ] (10) k=i k=i wherein: πΑF_PW = page weight based on individual access frequency, calculated overa given period ofn days, AFk= a c c ess frequency of the given page a t a sp e c ific time on Day Dk

Wk= Weight of day k n = Tbtalnumberof days

In another act 730, the page weight based on the time spent on a given page by visitors during certain "time of the day" is calculated. Once a page weight has been calculated in act 720 based on the visit frequency that page has received during a certain time slot, another factor of page weight is calculated. This one is based on how much time the visitor spent on that page during that time slot. Eg. if most of the time spent on a page happens during the morning time of the day's slot, then this behavior will be captured by this page weight component. An example of time spent by users per day on a page is shown in FlG.5. The duration of access is plotted as a function of the calendar days, the first visit being on day 1 (Dl) and the accesses are registered overa period ofn days (Dl to Dn). n n mS_PW= [[∑TSk x Wk]/ [∑ Wk] ] (10) k=0 k=0 wherein: πTS_PW = page weight based on individual time spent on a given page, calculated overa given period of n days

TSk= Time spent on a page at a specific time on Day Dk Wk= Weight of day k n = Tbtalnumberof days In a further act 740, an access time page weight is calculated, i.e. an aggregated page weight based on the time when the given page was accessed is calculated. The weights calculated in the previous acts 710 to 730 are aggregated to that effect. In an alternative embodiment of the present method, only some of these weights may be aggregated, depending on how significant these weights may be.

PW T= ((mS_PWt) + (DAF-PWt)) / Ns (H)

wherein:

PW_T= access time page weight, ie. the page weight of a given page b a se d in its a c c e ss time thro ugho ut a day,

TAWt is the weightofthe time slot t, as calculated in act 710,

ΠAF_PW = page weight based on individual access frequency, calculated o vera given period of n days, as calculated in act 720,

UTS_PW = page weight based on individual time spent calculated over a given period of n days, as calculated in act 730,,

Ns = 3 a s thre e time slo ts we re c ho se n fo r this e xa mp Ie .

HnalBage Weight based on user, location and access time slot data

The final Page Weight of a given page is derived from the previous calculation of PW_U, PW_Land PW_T For a 11 users, a 11 Io cations, all times of ac cess during a day, the final Page Weight of a page may be calculated as:

PW = PWJJ + PW_L + PW_T ( 12)

The final Page Weight PW thus takes into account the Users Weight, location Weight, Time of Access Weight and also the influence that each of these parameters has on the time spent and ac cess frequency of a page. Thus, taking all these factors into c onside ration, the final result of the equation gives a popularity index of a page based on usage data and the network profile. The calculated page weights may be stored in a repository (not shown in FlG. 2) associated to the page indexerengine 265 of FlG.2.

Thanks to the present ranking method, a webpage index may be created within the operator network. This webpage index comprises a plurality of web p age s (identified through their URL address for instance, web pages that are accessed over the operators network) associated to a webpage weight as calculated here before.

The invention also relates to a search engine 280 that searches through the pages and documents indexed with the method according to the invention. The search engine may be available within or outside (as illustrated in FIG. 2) the operator's network. Such a search engine comprises a keyword indexer (as described for known search engines in relation to ElG.1) that allows to index the web c ontent for keywords. As for known search engines, a user 270 will query the search engine 280 with one ormore keywords. The search engine 280 willretum a list of hits that match the user's keywords. The hits are the webpages addresses, the webpage content of which matches the keywords. Tb each hit corresponds a web address and a webpage. The list of hits may be already ranked using the search engines own ranking system. Known ranking methods are for example Page Rank® from Google.

In the present indexing method, the list of hits is sent to the page indexer 265 for re-ranking. For each hit returned by search engine 280, its page weight is retrieved from the repository of page weights (associated to page indexer 265). The list o f hits is the n returned to the user 270, said list be ing re -ranked, with the hits with the highest weights according to the present indexing method placed first (ie . o n to p of the list).

As described here before, the page indexer 265 is operable to :

-receive a list of hits, the list o f hits b e ing generated by a search engine 280, either inside or outside the opera tor network, -retrieve from the webpage index (the repository of page weights) a webpage weight for each webpage corresponding to a hit in the list of hits,

-rank the list o f hits b a se d on the retrieved weights.

Using tfae nβtwoik usage and profiling data with a sβaich engine: The network usage data and profiling data of a userare interesting data fora search engine, as they can help narrow down the list of search results to better match the user's interest. Among the usage data, the demographic data is specifically of interest as it comprises his/her age, location, gender which can be directly used to re -rank the list of hits. An illustration of another embodiment of the present system is provided in HG.9. The same numbers referto the same elements as in FIG.2.

The network usage and profiling data is available at the network profiling engine 260. Depending on the search engine used by the user, different elements from the user network pro file maybe transmitted along with the user search query to the search engine 280. Whether search engine 280belongsto the operator's network or not, the search query made by the user may be intercepted before it is sent to engine 280 by a search enhance mode 261. Search enhance mode 261 will:

-intercept the search query from user 270, -identify the userand retrieve its network pro file from the profiling engine 260 and its databases where the profiling information is stored,

-identify the search engine and retrieve from a contract repository 285 the contract related this identified search engine. The contract will comprise rules defining the elements from the profiling information of the userto be passed on the se arc h e ngine alo ng with the se arc h que ry

-forward the search query to the search engine 280, along with the relevant userprofQe parameter, based onthe contract information.

As explained earlier on in relation to the sniffers 225 (analysis on the go), the information that a user is making a search request is available at the network level, whether the query is made over a mobile unit, a computer or another device connected to the network To intercept the search query from user270, the search enhancer node 261 may comprise a sniffer module, like sniffers 225 described in relation to FIG.2, to determine whether a user 270 is accessing a search engine 280.

This analysis may be run in realtime, so that the request is intercepted and, analyzed to identify both the user and the chosen search engine. Based on the contract information retrieved from repository 285, the user's request may be completed with additionalparametersthatfurtherdescribe the user270.

The additional profiling information received along with the user query at the search engine node willbe used to improve the relevance of the search results, the relevance being understood as better matching the user's profile. Fthe user's age is passed on along with the search query, the hits corresponding to websites frequently accessed by users from the same age group willbe placed on top of the list of hits returned to the user. Fthe user's location is passed, the webpages linked somewhat to the same location willbe returned on top of the list. Obviously, readily discernible modifications and variations of the present indexerare possible in light of the above teachings. I is therefore to be understood that within the scope of the appended claims, the present invention may be practiced otherwise than as specifically described herein. Fbr example, while described in terms of hard ware /software components interactively cooperating, it is contemplated that the invention described herein may be practiced entirely in software. The software maybe embodied in a carrier such as magnetic oroptical disks, ora radio frequency or audio frequency carrierwave.

Thus, the foregoing discussion discloses and describes merely exemplary embodiments of the p re se nt inve ntio n. Aswillbe understood by those skilled in the art, the present invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. Accordingly, the disclosure of the present invention is intended to be illustrative, but not limiting of the scope of the present invention, as well as other claims. The disclosure, including any readily discernible variants of the teachings herein, define, in part, the scope of the foregoing claim terminology such that no inventive subject matter is dedicated to the public

The section headings included herein are intended to facilitate a review but are not intended to limit the scope of the present system. Accordingly, the specification and drawings are to be regarded in an illustrative manner and are notintended to limit the scope ofthe appended claims.

In interpreting the appended claims, it should be understood that: a) the word "comprising" does not exclude the presence of other elements oractsthan those listed in a given claim; b)the word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements; c)anyreference signs in the claimsdo no t limit the ir sc o p e ; d) several "means" may be represented by the same item or hardware or software implemented structure or function; e)any ofthe disclosed elements may be comprised ofhardware portions

(e.g., including discrete and integrated electronic circuitry), software portions (e.g., computerprogramming), and any combination thereof; f) hardware portions may be comprised of one or both of analog and digital portions; g) any of the disclosed devices or portions thereof may be combined togetherorseparated into furtherportions unless specifically stated otherwise; h) no specific sequence of acts or steps is intended to be required unless specifically indicated; and i) the term "plurality of an element includes two or more of the claimed element, and does not imply any particular range of number of elements; that is, a plurality of elements can be as few as two elements, and can include an imme a sura b Ie numb erofeleme nts.

Claims

CLAIMSWhat is claimed is:

1. In an operator network, a method to enhance the relevance of search results provided by a search engine serveraccessible through said network, said method comprising the acts of:

- collecting data related to a subscriberto said operatornetwork,

-transmitting the search query to said search engine server through the network, said query being associated to at least part of the subscriber network profile,

2. the method of claim 1, wherein the act transmitting the search query is preceded by the acts of:

- identifying the search engine, -retrieving, from a repository of contract, rules attached to said identified search engine, and,

-selecting the part of the network subscriber profile to be transmitted to the search engine server based on the retrieved c ontrac t rules.

3. The method of claim 1, wherein the network user profile comprises at least one of the userage, location, and profession.

4. In an operatornetwork, a system to enhance the relevance of search results provided by a search engine serveraccessible through said network, said system comprising:

- a profiling node arranged to:

- collect data related to a subscriberto said operatornetwork,

- aggregate said data to built a network pro file forsaid subscriber,

- a search enhance mode arranged to : - receive a search queiy from said subscriber addressed to the search engine server,

- transmit the search query to said search engine server through the network, said query being associated to at least part of the subscriber network pro file,

- receiving search results from the search engine server, said search results being ranked according to said at least part of the subscriber network pro file.

5. The system of claim 4, wherein prior to transmitting the search query, the search enhance mode is further arranged to :

- identify the search engine,

- retrieve, from a repository of contract, rules attached to said identified search engine, and, - select the part of the network subscriber profile to be transmitted to the search engine serverbased on the retrieved c o ntra c t rule s.

6. The system of claim 4, wherein the network user pro file c omp rises at least one of the use rage, Io c ation, and profession.

7. A c omputer readable c airier including computer program instructions that cause a computer to implement a method for enhancing the relevance of search results provided by a search engine server accessible through an operator network, said c omputerreadable c airier c omprising: - instruc tio ns fo r re c e iving a search query from a subscriber to said operator networkaddressed to the search engine server,

- instruc tio ns fo r tra nsmitting the se a rc h q ue ry to sa id se a rc h e ng ine se rve r through the network, said query being associated to at least part of a network profile for said subscriber, said network profile resulting from aggregating data related to said subscriberand collected oversaid operatornetwork,

-instruction for receiving search results from the search engine server, said search results being ranked according to said at least part of the subscriber network p ro file.

8. The c airier of claim 7, further comprising instruc tio ns to , prior to transmitting the search query:

- identify the search engine,

- retrieve, from a repository of contracts, rules attached to said identified search engine, and,

- select the part of the network subscriber profile to be transmitted to the search engine serverbased on the retrieved c o ntra c t rule s.

9. In an operator network, a method to index a webpage accessed by a plurality of subscribers of said opera tor network, said method comprising the acts of:

-collecting data related to said accessed webpage, said data belonging to atleastone of three categories, said c ate go ries being:

- the subscribers who have ace essed said webpage, -the locations where from the webpage was accessed,

- the times of ace ess of said webpage,

-calculating a page weight for said accessed webpage based on the collected data,

- indexing the ace essed webpage with said calculated weight.

10. The method according to claim 9, said method further comprising the acts of collecting additional data related to the accessed webpage, said data belonging to atleastone of two categories, said categoriesbeing:

- the duration of ac cess of said webpage, - the frequency of ac cess of the said webpage.

11. The method of claim 10, the method further c omp rising for each collected category ofdata the acts of:

- selecting a population ofdata among the collected data of the category, -calculating for the accessed webpage and for each data item of the population:

-a first page weight based on the activity of said data item, said activity depending at least upon the number of all webpages the acc ess of which is related to said data item compared to the number of allwebpages the acc ess of which is related to the entire population, -a second page weight based on the frequency of acc esses to the acc essed web page that are in relation to said item, - a third page weight based on the duration of acc ess to the accessed webpage that is in relation to said item,

-averaging over the entire population the second and third page weights, both said second and third page weights being weighted with the first page weight.

12. The method of claim 11, wherein the activity is calculated according to the expression:

wherein: Aiisthe activity defined forthe data item i, Np is the size ofthe selected population, NRisthe numberof webpagesvisited in relation to data item i.

13. The method of claim 12, wherein the first page weight is further based on a wealth index defined forthe data item, said wealth index being a measure of said data item revenues.

14. The method of claim 11, wherein the second and third page weights are calculated according to the expression:

N N Wi= [ [J]AHkX WkV[J]Wk] ] k=i k=i wherein:

Wi is either the second or third page weight defined for data item i,

AHk is: -the frequency of ace esses to the accessed webpage that are related to data item i, said frequency being defined over a given unit of time k, if the second page weight is considered,

-the duration of ace esses to the accessed webpage that are related to data item i, said duration being defined over a given unit of time k, if the third page weight is considered,

Nisthe totalnumberof units of time under consideration, Wk is the weight of the given unit of time, Wk be ing defined by:

NPk

Wk =

N x 100

∑NPj

J=¹ withNPkisthe totalnumberof accessed webpages during the given unit of time k

15. In an operator network, a indexer engine to index a webpage accessed bya plurality of subscribers of said operator network, said indexer engine being operable to : -collect data related to said accessed webpage, said data belonging to at leastone of three c ategories, said categoriesbeing:

- the times of ace ess of said webpage, -calculate a page weight for said accessed webpage based on the collected data,

- index the ac c essed webpage with said calculated weight.

16. The engine of claim 15, said engine being further operable to collect additional data related to the accessed webpage, said data belonging to at leastone of two categories, said c ategories being:

- the duration of ac cess of said webpage,

- the frequency of ac cess of the said webpage.

17. The engine of claim 16, the engine being further operable foreach collected category of data to : - select a population of data among the collected data of the category, -calculate for the accessed webpage and for each data item of the population:

-a first page weight based on the activity of said data item, said activity depending at least upon the number of all webpages the ace ess of which is related to said data item compared to the number of all webpages the ace ess of which is related to the entire population, -a second page weight based on the frequency of ace esses to the ace essed webpage that are in relation to said item, - a third page weight based on the duration of ace ess to the accessed webpage that is in relation to said item,

-average over the entire population the second and third page weights, both said second and third page weights being weighted with the first page weight.

18. The engine of claim 17, said engine being further operable to calculate the activity ac cording to the expression:

wherein: Aiisthe a c tivity d e fine d forthe data item i, Np is the size ofthe selected population, NPiisthe numberof webpagesvisited in relation to data item i.

19. The engine of claim 17, said engine being further operable to calculate the second and third page weights ac cording to the expression:

N N

Wi= [ [J]AHkX WkV[J]Wk] ] k=i k=i wherein:

Wi is either the second or third page weight defined for data item i,

NPk

Wk =

N x 100

∑NPj

J=¹

20. A computer readable earner including computer program instructions that cause a computerto implement a method to index a webpage accessed by a plurality o f sub sc rib e rs of an operator network, said data c airier comprising:

- instruc tio ns fo r c o lie c ting data related to said accessed webpage, said data belonging to atleastone of three categories, said categoriesbeing:

- the times of ace ess of said webpage,

-instructions for calculating a page weight for said accessed webpage based on the collected data,

- instruc tio ns fo r ind e xing the accessed webpage with said calculated weight.

21. The carrierof claim 20, said c a rrierfurther comprising instruc tio ns fo r c o lie c ting additional data related to the accessed webpage, said data belonging to at leastone of two categories, said categoriesbeing:

- the duration of ac cess of said webpage,

- the frequency of ac cess of the said webpage.

22. The carrier of claim 21, said carrier further comprising for each collected category of data: - instruc tio ns fo r se Ie c ting a population of data among the collected data of the category,

- instructions for c ale ula ting for the accessed webpage and foreach data item of the population: -a first page weight based on the activity of said data item, said activity depending at least upon the number of all webpages the ace ess of which is related to said data item compared to the number of all webpages the ace ess of which is related to the entire population, -a second page weight based on the frequency of ace esses to the ace essed webpage that are in relation to said item,

- a third page weight based on the duration of ace ess to the accessed webpage that is in relation to said item,

- instructions for averaging over the entire population the second and third page weights, both said second and third page weights being weighted with the first page weight.

23. The carrier of claim 22, said carrier comprising further instructions for calculating the activity ace ord ing to the expression:

wherein:

Aiisthe activity defined forthe data item i,

Np is the size ofthe selected population,

NPiisthe numberof webpagesvisited in relation to data item i.

24. The carrier of claim 22, said carrier further comprising instructions for calculating the second and third page weights ac cording to the expression:

N N

Wi= [ [J]AHkX WkV[J]Wk] ] k=i k=i wherein:

Wi is either the second or third page weight defined for data item i, AHk is:

-the frequency of ace esses to the accessed webpage that are related to data item i, said frequency being defined over a given unit of time k, if the second page weight is considered,

NPk

Wk = -x 100

N

∑NPj

J=¹ with NPkis the totalnumberof accessed webpages during the given unit of time k

25. A webpage index comprising a plurality of webpages associated to a webpage weight defined using the indexing engine of claims 15 to 19.

26. In an operator network, a method to rank a list of webpages, said method comprising the acts of:

- receiving a list of webpages,

- retrieving from a webpage index a webpage weight for each webpage of the said list of webpages, said webpage index being defined according to claim

25,

- ranking the list of web pages based on the retrieved weights.

27. Ih an operator network, a page indexer operable to re -rank a list of hits from a search engine, said page indexer being operable to :

- re c e ive a list o f hits,

-retrieve from a webpage index a webpage weight for each webpage corresponding to a hit in the list of hits, said webpage index being defined according to claim 25, -rankthe list o f hits b a se d on the retrieved weights.