CN1906614A - 用于处理锚点文本的方法、系统与程序 - Google Patents

用于处理锚点文本的方法、系统与程序 Download PDF

Info

Publication number
CN1906614A
CN1906614A CNA2005800018061A CN200580001806A CN1906614A CN 1906614 A CN1906614 A CN 1906614A CN A2005800018061 A CNA2005800018061 A CN A2005800018061A CN 200580001806 A CN200580001806 A CN 200580001806A CN 1906614 A CN1906614 A CN 1906614A
Authority
CN
China
Prior art keywords
document
information
anchor point
logic
anchor
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CNA2005800018061A
Other languages
English (en)
Inventor
雷纳尔·克拉夫特
安德列亚斯·纽曼
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
International Business Machines Corp
Original Assignee
International Business Machines Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by International Business Machines Corp filed Critical International Business Machines Corp
Publication of CN1906614A publication Critical patent/CN1906614A/zh
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • G06F16/9558Details of hyperlinks; Management of linked annotations
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S707/00Data processing: database and file management or data structures
    • Y10S707/99931Database or file accessing
    • Y10S707/99933Query processing, i.e. searching
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S707/00Data processing: database and file management or data structures
    • Y10S707/99931Database or file accessing
    • Y10S707/99933Query processing, i.e. searching
    • Y10S707/99935Query augmenting and refining, e.g. inexact access

Abstract

所公开的是用于处理用于信息检索的锚点文本的方法、系统与程序。形成指向目标文档的锚点集合。具有相同锚点文本的锚点分组到一起。为每个组计算信息。基于所计算出的信息为目标文档生成上下文信息。

Description

用于处理锚点文本的方法、系统与程序
技术领域
本发明涉及处理用于信息检索的锚点文本。
背景技术
万维网(也称为WWW或“web”)是支持可以包括对其它web页面的链接的web页面的一些因特网服务器的集合。统一资源定位器(URL)指示web页面的位置。而且,每个web页面可以包含例如文本、图片、音频和/或视频内容。例如,第一web页面可以包含对第二web页面的链接。当在第一web页面中选择该链接时,一般会显示第二web页面。
web浏览器是用于定位并显示web页面的软件应用程序。目前,在web上有几十亿的web页面。
web搜索引擎用于根据(例如,通过web浏览器输入的)某种标准检索web上的web页面。即,web搜索引擎设计成返回在给定关键字查询的情况下的相关web页面。例如,对公司内联网搜索引擎发出的查询“HR”被期望返回该内联网中关于人力资源(HR)的相关页面。web搜索引擎使用将搜索项(例如,关键字)与web页面关联的索引技术。
锚点可以描述为对文档的链接或路径(例如,URL)。锚点文本可以描述为与指向文档的路径或链接(例如,URL)关联的文本。例如,锚点文本可以是标记或包含web文档中超文本链接的文本。锚点文本由web搜索引擎收集并与目标文档关联。而且,锚点文本与目标文档一起进行索引。
web搜索引擎使用上下文信息(例如,标题、摘要、语言等)丰富搜索结果。这向用户提供了过滤的搜索结果。但是,锚点文本可以与用作上下文信息不相关。例如,锚点文本可以是与目标文档不同的语言,而且未进一步处理的锚点文本的使用可能导致例如用于英文文档的日文标题。此外,锚点文本可以与文档的内容不相关。例如,锚点文本可以包含经常出现且主要用于导航但作为标题没有任何有意义价值的通用词汇(例如,“点击这里”)。而且,锚点文本可以不准确、不礼貌或者可以包含俚语(例如,对“网络安全指南”的锚点具有“在找麻烦吗?”的锚点文本)。
此外,当web页面的内容不能被检索(例如,由于服务器断电、用于由搜索引擎处理的web页面检索的不完整、robots.txt禁止访问)或者当文档被检索但不能被分析(例如,因为文件是视频/音频/多媒体文件、是未知或不支持的格式、形式不好或者是口令保护的)时,上下文信息的生成非常困难。
大多数搜索引擎只显示统一资源定位器(URL),而没有web页面的内容。但是,这使得用户如果不看web页面本身就很难捕捉到搜索结果的用途。
因此,需要改进的文档处理来提供用于如web页面的文档的上下文信息。
发明内容
提供了用于处理锚点文本的方法、系统与程序。形成指向目标文档的锚点集合。具有相同锚点文本的锚点分组到一起。为每个组计算信息。基于所计算出的信息为目标文档生成上下文信息。
附图说明
现在参考附图,其中相同的标号始终表示对应的部件:
图1以方框图说明了根据本发明的特定实现的计算环境。
图2说明了根据本发明的特定实现的实现成准备用于处理的锚点的逻辑。
图3A和3B说明了根据本发明的特定实现的实现成处理锚点文本的逻辑。
图4说明了根据本发明的特定实现的用于执行文档搜索的逻辑。
图5说明了根据本发明的特定实现可以使用的计算机系统的体系结构。
具体实施方式
在以下描述中,参考构成本发明的一部分并说明了本发明几种实现的附图。应当理解,在不背离本发明范围的情况下,其它实现也可以使用,而且可以进行结构上和操作上的改变。
通过代替或除内容之外还索引锚点文本,本发明的特定实现使文档可以用于搜索。特定实现根据指向文档的锚点的锚点文本生成上下文信息。例如,至少一部分锚点文本可以指定为文档的标题或摘要。但是,由于锚点文本可能是与目标文档不同的语言、锚点文本可能与文档内容不相关或者锚点文本可能不准确、不礼貌或可能包含俚语,因此可能难以识别有意义的锚点文本。此外,要特别注意除去锚点文本中作为例如标题不具有有意义价值的噪声(例如,如“下一列”的URL、文件名、导航文本)。
因此,本发明的特定实现处理原始锚点文本,以获得高质量的标题和摘要。本发明的特定实现提炼原始锚点文本,以获得可以用于为搜索结果项生成标题或摘要数据的高质量数据。原始锚点文本处理的结果提高了整体搜索质量,因此改善了用户在文档检索系统中的体验。
图1以方框图说明了根据本发明的特定实现的计算环境。客户端计算机100通过网络190连接到服务器计算机120。客户端计算机100可以包括本领域已知的任何计算设备,如服务器、大型机、工作站、个人计算机、手持式计算机、膝上型电话设备、网络工具等。网络190可以包括任何类型的网络,例如存储区域网(SAN)、局域网(LAN)、广域网(WAN)、因特网、内联网等。客户端计算机100包括可以在易失和/或非易失设备中实现的系统存储器104。一个或多个客户端应用程序110及阅读器应用程序112可以在系统存储器104中执行。阅读器应用程序112提供启用(例如,存储在一个或多个数据存储器170中的)一组文档搜索的接口。在特定实现中,阅读器应用程序112是web浏览器。
服务器计算机120包括可以在易失和/或非易失设备中实现的系统存储器122。搜索引擎130在系统存储器122中执行。在特定实现中,搜索引擎包括爬行(crawler)组件132、静态分级组件134、文档分析组件136、复制检测组件138、锚点文本组件140及索引组件142。锚点文本组件140包括上下文信息生成器141。尽管组件132、134、136、138、140、141及142说明为独立组件,但组件132、134、136、138、140、141及142的功能性可以在比所说明的更少或更多或不同组件中实现。此外,组件132、134、136、138、140、141及142的功能性可以在web应用服务器计算机或连接到服务器计算机120的其它服务器计算机中实现。此外,一个或多个服务器应用程序160在系统存储器122中执行。
服务器计算机120向客户端计算机100提供对至少一个数据存储器170(例如,数据库)中的数据的访问。尽管为了方便理解而说明了单个数据存储器170,但数据存储器170中的数据可以存储在连接到服务器计算机120的其它计算机的数据存储器中。
而且,操作员控制台180执行一个或多个应用程序182并用于访问服务器计算机120和数据存储器170。
数据存储器170可以包括如直接存取存储设备(DASD)、简单磁盘捆绑(JBOD)、冗余独立磁盘阵列(RAID)、虚拟设备等存储设备的阵列。数据存储器170包括与本发明特定实现一起使用的数据。
图2说明了根据本发明的特定实现的实现成准备用于处理的锚点的逻辑。控制在块200开始,其中锚点文本与各个锚点关联。这可以由例如创建锚点的各个用户完成。锚点可以描述为从源文档到目标文档的路径或链接(例如,URL)。
在块202,获得要由搜索引擎130索引的文档。在特定实现中,文档被发布或推到(例如,就象关于报纸的情况一样)索引组件142。在特定实现中,爬行组件132发现、提取并存储文档。在特定实现中,爬行组件132可以基于例如特定标准(例如,在最近一个月访问的文档)发现文档。此外,爬行组件132可发现直接(例如,数据存储器170)或间接(例如,通过其它计算设备(未示出)连接到服务器计算机120)连接到服务器计算机120的一个或多个数据存储器中的文档。在特定实现中,爬行组件132发现、提取并在数据存储器170中存储web页面。这些存储的文档可以称为“文档集合”。
在块204,文档分析组件136执行每文档分析。特别地,文档分析组件136评审所存储的文档、解析并标记文档、并对每个文档确定每个文档书写所用的语言、提取锚点文本并执行如文档分类及分级的其它任务。语言信息的存储是为了以后使用。例如,文档分析组件136确定文档中所使用的主要语言是英文、日文、还是德文等。作为提取锚点文本的一部分,文档分析组件136还将接近的类与每一锚点关联。接近类可以描述为指定源文档与目标文档有多接近(例如,它们是否在相同的服务器上,如果是,那么它们是否在相同的目录中)。而且,提取出的锚点文本准备好由另一锚点文本组件140处理。
在块206,静态分级组件134评审所存储的文档并向文档分配级别。级别可以描述为源文档相对于由爬行组件132已存储的其它文档的重要性。任何类型的分级技术都可以使用。例如,较频繁访问的文档可以接收较高的级别。
在块208,上下文信息生成器141按目标文档分类锚点。这导致用于目标文档的锚点集合一起分到一个组中,作进一步处理。就象将参考图3A和3B所描述的,每个组对于每个目标文档单独处理。
图3A和3B说明了根据本发明的特定实现的实现成处理锚点文本的逻辑。控制在块300开始,其中上下文信息生成器141确定用于目标文档的锚点集合中指向目标文档的锚点的源文档的主要语言。在特定实现中,如果多于可配置百分比的源文档具有相同的语言,则集合中源文档语言不同于主要语言的锚点被除去。可配置百分比可以描述为可以由例如系统管理员或其它应用程序修改的百分比。
在块302,上下文信息生成器141除去具有包含到目标文档的路径(例如,URL)或路径的一部分的锚点文本的锚点。在块304,基于锚点文本是否及以什么次序或组合包含来自可配置单词集合的单词,上下文信息生成器141除去锚点文本(例如,可除去只包含来自可配置集合的单词的锚点文本、包含至少多个来自可配置集合的单词或以特定顺序包含来自可配置集合的单词的锚点文本)。可配置单词集合可以例如由系统管理员确定。例如,可配置单词集合可以包括无用单词,如“点击这里”或“该”。
在块306,上下文信息生成器141按锚点文本分类锚点集合并将具有相同锚点文本的锚点分组到一起。在块308,上下文信息生成器141为每个组计算加权的文本出现次数总和。文本每次单独出现的权值可以由锚点的接近类确定。例如,如果第一文档具有接近类A,第二文档具有接近类B,而第三文档具有接近类C,且类A、B和C分别具有权值10、5和2,则加权总和为17。
在块310,上下文信息生成器141为每个组计算累计级别。即,根据其源文档与其接近类的级别,组中的每个锚点都对这个级别起作用。例如,如果第一文档具有接近类A,第二文档具有接近类B,而第三文档具有接近类C,且类A、B和C分别具有权值10、5和2,则如果第一、第二和第三文档分别具有静态级别9、13和16,而且如果累计级别由加权平均值计算,则累计级别是(9*10+13*5+16*2)/(10+5+2)=187/17=11。计算累计级别的其它技术包括最小值、最大值或者这二者结合并让一个接近类的级别相对于其它接近类的级别优先等。
在块312,上下文信息生成器141为每个组计算语言得分。在特定实现中,这种得分可以通过对文本可作为标题显示的能力进行打分的文本语言分析计算。例如,作为标题显示的能力可以通过考虑文本中单词的个数(例如,标题应当是简短的)、进一步的文本语言分析、每个单词或指向目标文档的锚点集合中所有锚点中单词出现次数或者当目标文档可以访问时锚点与目标文档内容相似性的统计分析来确定。
在块314,上下文信息生成器141根据出现次数的加权总和、累计静态级别和语言得分为每个组计算组合相关性得分。
在块316,上下文信息生成器141为目标文档生成上下文信息。在特定实现中,上下文信息生成器141选择具有最高组合相关性得分的组的文本作为伪标题、从具有最高相关性得分的n组的锚点文本组成用于目标文档的基于锚点的静态摘要并从主要源语言推断语言T。
一旦完成锚点文本处理,索引组件142就利用处理后的锚点文本生成索引。
图4说明了根据本发明的特定实现的用于执行文档搜索的逻辑。控制在块400开始,其中用户通过阅读器应用程序112提交搜索请求。在块402,搜索引擎130执行该搜索请求。在块404,搜索引擎返回包括锚点文本处理及图2A和2B所述的其它处理的搜索结果。在块406,阅读器应用程序112显示搜索结果。
因此,本发明的特定实现提供了根据锚点集合为搜索结果项生成高质量上下文信息的技术。在特定实现中,执行对每个文档的分析以便识别文档书写所用的语言、执行所有文档的全局分析以便为每个文档分配静态级别、而且锚点按目标文档分类以便为每个目标文档获得指向该目标文档的所有锚点的逻辑集合。对于指向目标文档的每个锚点集合,可以执行以下处理:分析源文档语言的分布;基于语言分布剪除来自集合的锚点;基于无用单词和URL检测的噪声过滤;根据源与目标的接近性分类每个锚点;及向每个接近类分配权值。此外,每个锚点可以被根据锚点的锚点文本的语言分析打分。此外,剩余唯一锚点文本的相关性排序(即,相同的文本可以在不同的锚点上)可以基于每个接近类中出现次数的加权总和、所有源文档的累计级别和文本的语言得分执行。
锚点文本处理的结果是高质量标题、摘要及其它上下文信息(例如,对每个目标最有可能的语言)。对于目标文档不可用的搜索结果,这种上下文信息可以显示给用户。如果目标文档本身可用,则所生成的上下文信息可以用于丰富从目标文档获得的信息(例如,通过找文档及其锚点之间的相似性)。
所描述的用于处理锚点文本的技术可以利用标准编程和/或工程技术以产生软件、固件、硬件或其任意组合来实现为方法、装置或制造物。在此所使用的术语“制造物”指在硬件逻辑(例如,集成电路芯片、可编程门阵列(PGA)、专用集成电路(ASIC)等)或者如磁存储介质(例如,硬盘驱动器、软盘、磁带等)、光存储器(CD-ROM、光盘等)、易失和非易失存储器设备(例如,EEPROM、ROM、PROM、RAM、DRAM、SRAM、固件、可编程逻辑等)的计算机可读介质中实现的代码或逻辑。计算机可读介质中的代码由处理器访问并执行。各种实现都可以通过其实现的代码还可以通过传输介质或通过网络从文件服务器访问。在这种情况下,其中代码执行的制造物可以包括如网络传输线、无线传输介质、通过空间传播的信号、无线电波、红外线信号等的传输介质。因此,“制造物”可以包括代码在其中体现的介质。此外,“制造物”可以包括代码在其中体现、处理和执行的硬件与软件组件的组合。当然,本领域技术人员将认识到在不背离本发明范围的情况下可以对这种配置进行许多修改,而且制造物可以包括本领域已知的任何信息承载介质。
图2、3A、3B和4的逻辑描述了以特定次序发生的特定操作。在可选实现中,特定的逻辑操作可以不同的次序执行、修改或除去。此外,操作可以添加到上述逻辑并仍然遵循所述实现。此外,在此所述的操作可以顺序发生,或者特定操作可以并行处理,或者描述为由单个处理执行的操作可以由分布式处理执行。
图2、3A、3B和4所说明的逻辑可以以软件、硬件、可编程和非可编程门阵列逻辑或者以硬件、软件或门阵列逻辑的某种组合实现。
图5说明了根据本发明的特定实现可以使用的计算机系统的体系结构。例如,客户端计算机100、服务器计算机120和/或操作员控制台180可以实现计算机体系结构500。计算机体系结构500可以实现处理器502(例如,微处理器)、存储器504(例如,易失存储器设备)及存贮器510(例如,非易失存贮器区域,如磁盘驱动器、光盘驱动器、磁带驱动器等)。操作系统505可以在存储器504中执行。存贮器510可以包括内部存贮器设备或者附属或网络访问存贮器。存贮器510中的计算机程序506可以本领域中已知的方式加载到存储器504中并由处理器502执行。该体系结构还包括启用与网络通信的网卡508。输入设备512用于向处理器502提供用户输入,可以包括键盘、鼠标、指示笔、麦克风、触摸敏感显示屏幕或本领域中已知的任何其它激活或输入机制。输出设备514能够再现来自处理器502或者如显示监视器、打印机、存贮器等的其它组件的信息。计算机系统的计算机体系结构500可以包括比所说明的少的组件、未在此说明的附加组件或者所说明的组件与附加组件的某种组合。
计算机体系结构500可以包括本领域中已知的任何计算设备,如大型机、服务器、个人计算机、工作站、膝上型电脑、手持式计算机、电话设备、网络工具、虚拟设备、存贮器控制器等。本领域中已知的任何处理器502和操作系统505都可以使用。
前面对本发明的实现的描述是为了说明和描述而提出的。它不打算是穷尽的或者要将本发明限定到所公开的精确形式。根据以上教义,许多修改与变体都是可能的。本发明的范围不打算由该具体描述限定而是由所附权利要求限定。以上说明、例子及数据提供了本发明的组成物的制造及使用的完整描述。由于在不背离本发明主旨与范围的情况下可以作出本发明的许多实现,因此本发明在于下文所附的权利要求。

Claims (19)

1、一种用于处理锚点文本的方法,包括:形成指向目标文档的锚点集合;将具有相同锚点文本的锚点分组到一起;为每个组计算信息;及基于所计算出的信息为目标文档生成上下文信息。
2、如权利要求1所述的方法,还包括:确定文档集合中每个文档的语言;确定该文档集合中每个文档的级别;及确定该文档集合中每个文档的接近类。
3、如权利要求1或权利要求2所述的方法,还包括:确定锚点集合中的主要语言;及从该集合中剪除不是主要语言的锚点。
4、如权利要求1、2或3任何一项所述的方法,还包括:从该集合中剪除包括到目标文档的路径的至少一部分的锚点。
5、如前面任何一项权利要求所述的方法,还包括:基于可配置单词集合剪除锚点。
6、如前面任何一项权利要求所述的方法,其中所述信息计算还包括:为每个组中锚点的锚点文本计算出现次数的加权总和。
7、如前面任何一项权利要求所述的方法,其中所述信息计算还包括:为每个组计算累计级别。
8、如前面任何一项权利要求所述的方法,其中所述信息计算还包括:为每个组计算语言得分。
9、如前面任何一项权利要求所述的方法,其中所述信息计算还包括:为每个组生成相关性得分。
10、一种包括用于处理锚点文本的逻辑的计算机系统,该逻辑包括:形成指向目标文档的锚点集合;将具有相同锚点文本的锚点分组到一起;为每个组计算信息;及基于所计算出的信息为目标文档生成上下文信息。
11、如权利要求10所述的计算机系统,其中该逻辑还包括:确定文档集合中每个文档的语言;确定该文档集合中每个文档的级别;及确定该文档集合中每个文档的接近类。
12、如权利要求10或权利要求11所述的计算机系统,其中该逻辑还包括:确定锚点集合中的主要语言;及从该集合中剪除不是主要语言的锚点。
13、如权利要求10至12任何一项所述的计算机系统,其中该逻辑还包括:从该集合中剪除包括到目标文档的路径的至少一部分的锚点。
14、如权利要求10至13任何一项所述的计算机系统,其中该逻辑还包括:基于可配置单词集合剪除锚点。
15、如权利要求10至14任何一项所述的计算机系统,其中用于计算信息的逻辑还包括:为每个组中锚点的锚点文本计算出现次数的加权总和。
16、如权利要求10至15任何一项所述的计算机系统,其中用于计算信息的逻辑还包括:为每个组计算累计级别。
17、如权利要求10至16任何一项所述的计算机系统,其中用于计算信息的逻辑还包括:为每个组计算语言得分。
18、如权利要求10至17任何一项所述的计算机系统,其中用于计算信息的逻辑还包括:为每个组生成相关性得分。
19、一种用于处理文档中锚点文本的计算机程序,其中该程序使操作在数据处理装置中执行,该操作包括:形成指向目标文档的锚点集合;将具有相同锚点文本的锚点分组到一起;为每个组计算信息;及基于所计算出的信息为目标文档生成上下文信息。
CNA2005800018061A 2004-01-26 2005-01-26 用于处理锚点文本的方法、系统与程序 Pending CN1906614A (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US10/764,801 US7499913B2 (en) 2004-01-26 2004-01-26 Method for handling anchor text
US10/764,801 2004-01-26

Publications (1)

Publication Number Publication Date
CN1906614A true CN1906614A (zh) 2007-01-31

Family

ID=34795353

Family Applications (1)

Application Number Title Priority Date Filing Date
CNA2005800018061A Pending CN1906614A (zh) 2004-01-26 2005-01-26 用于处理锚点文本的方法、系统与程序

Country Status (5)

Country Link
US (2) US7499913B2 (zh)
EP (1) EP1714223A1 (zh)
JP (1) JP2007519111A (zh)
CN (1) CN1906614A (zh)
WO (1) WO2005071566A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107132963A (zh) * 2017-05-08 2017-09-05 深圳乐信软件技术有限公司 红点消息显示方法、消去方法以及相应装置
CN111625615A (zh) * 2019-02-27 2020-09-04 国际商业机器公司 文字提取与处理

Families Citing this family (49)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7293005B2 (en) 2004-01-26 2007-11-06 International Business Machines Corporation Pipelined architecture for global analysis and index building
US7424467B2 (en) * 2004-01-26 2008-09-09 International Business Machines Corporation Architecture for an indexer with fixed width sort and variable width sort
US8296304B2 (en) 2004-01-26 2012-10-23 International Business Machines Corporation Method, system, and program for handling redirects in a search engine
US7499913B2 (en) * 2004-01-26 2009-03-03 International Business Machines Corporation Method for handling anchor text
US7584221B2 (en) * 2004-03-18 2009-09-01 Microsoft Corporation Field weighting in text searching
US7260573B1 (en) * 2004-05-17 2007-08-21 Google Inc. Personalizing anchor text scores in a search engine
US7461064B2 (en) 2004-09-24 2008-12-02 International Buiness Machines Corporation Method for searching documents for ranges of numeric values
US7606793B2 (en) 2004-09-27 2009-10-20 Microsoft Corporation System and method for scoping searches using index keys
US8276099B2 (en) * 2004-09-28 2012-09-25 David Arthur Yost System of GUI text cursor, caret, and selection
US7739277B2 (en) * 2004-09-30 2010-06-15 Microsoft Corporation System and method for incorporating anchor text into ranking search results
US7761448B2 (en) 2004-09-30 2010-07-20 Microsoft Corporation System and method for ranking search results using click distance
US7827181B2 (en) 2004-09-30 2010-11-02 Microsoft Corporation Click distance determination
US7716198B2 (en) * 2004-12-21 2010-05-11 Microsoft Corporation Ranking search results using feature extraction
US7769579B2 (en) 2005-05-31 2010-08-03 Google Inc. Learning facts from semi-structured text
US20060161553A1 (en) * 2005-01-19 2006-07-20 Tiny Engine, Inc. Systems and methods for providing user interaction based profiles
US20060161587A1 (en) * 2005-01-19 2006-07-20 Tiny Engine, Inc. Psycho-analytical system and method for audio and visual indexing, searching and retrieval
US20060161543A1 (en) * 2005-01-19 2006-07-20 Tiny Engine, Inc. Systems and methods for providing search results based on linguistic analysis
US7792833B2 (en) * 2005-03-03 2010-09-07 Microsoft Corporation Ranking search results using language types
US9208229B2 (en) * 2005-03-31 2015-12-08 Google Inc. Anchor text summarization for corroboration
US8417693B2 (en) 2005-07-14 2013-04-09 International Business Machines Corporation Enforcing native access control to indexed documents
US7599917B2 (en) * 2005-08-15 2009-10-06 Microsoft Corporation Ranking search results using biased click distance
US8095565B2 (en) * 2005-12-05 2012-01-10 Microsoft Corporation Metadata driven user interface
US8560942B2 (en) * 2005-12-15 2013-10-15 Microsoft Corporation Determining document layout between different views
US20070150477A1 (en) * 2005-12-22 2007-06-28 International Business Machines Corporation Validating a uniform resource locator ('URL') in a document
US20070260597A1 (en) * 2006-05-02 2007-11-08 Mark Cramer Dynamic search engine results employing user behavior
US8442973B2 (en) * 2006-05-02 2013-05-14 Surf Canyon, Inc. Real time implicit user modeling for personalized search
US8117197B1 (en) 2008-06-10 2012-02-14 Surf Canyon, Inc. Adaptive user interface for real-time search relevance feedback
US8458207B2 (en) * 2006-09-15 2013-06-04 Microsoft Corporation Using anchor text to provide context
US8122026B1 (en) 2006-10-20 2012-02-21 Google Inc. Finding and disambiguating references to entities on web pages
US7657507B2 (en) 2007-03-02 2010-02-02 Microsoft Corporation Pseudo-anchor text extraction for vertical search
US8347202B1 (en) 2007-03-14 2013-01-01 Google Inc. Determining geographic locations for place names in a fact repository
CN101399818B (zh) * 2007-09-25 2012-08-29 日电(中国)有限公司 基于导航路径信息的主题相关网页过滤方法和系统
US7840569B2 (en) 2007-10-18 2010-11-23 Microsoft Corporation Enterprise relevancy ranking using a neural network
US9348912B2 (en) 2007-10-18 2016-05-24 Microsoft Technology Licensing, Llc Document length as a static relevance feature for ranking search results
US8812493B2 (en) 2008-04-11 2014-08-19 Microsoft Corporation Search results ranking using editing distance and document information
US20100318533A1 (en) * 2009-06-10 2010-12-16 Yahoo! Inc. Enriched document representations using aggregated anchor text
US8380722B2 (en) * 2010-03-29 2013-02-19 Microsoft Corporation Using anchor text with hyperlink structures for web searches
WO2011123981A1 (en) 2010-04-07 2011-10-13 Google Inc. Detection of boilerplate content
US8738635B2 (en) 2010-06-01 2014-05-27 Microsoft Corporation Detection of junk in search result ranking
US8793706B2 (en) 2010-12-16 2014-07-29 Microsoft Corporation Metadata-based eventing supporting operations on data
US9779385B2 (en) 2011-06-24 2017-10-03 Facebook, Inc. Inferring topics from social networking system communications
US9495462B2 (en) 2012-01-27 2016-11-15 Microsoft Technology Licensing, Llc Re-ranking search results
US10380606B2 (en) 2012-08-03 2019-08-13 Facebook, Inc. Negative signals for advertisement targeting
US9558233B1 (en) 2012-11-30 2017-01-31 Google Inc. Determining a quality measure for a resource
US9208233B1 (en) 2012-12-31 2015-12-08 Google Inc. Using synthetic descriptive text to rank search results
US9208232B1 (en) 2012-12-31 2015-12-08 Google Inc. Generating synthetic descriptive text
US20150169701A1 (en) * 2013-01-25 2015-06-18 Google Inc. Providing customized content in knowledge panels
CN104965902A (zh) * 2015-06-30 2015-10-07 北京奇虎科技有限公司 一种富集化url的识别方法和装置
CN111680152B (zh) * 2020-06-10 2023-04-18 创新奇智(成都)科技有限公司 目标文本的摘要提取方法及装置、电子设备、存储介质

Family Cites Families (199)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6182062B1 (en) * 1986-03-26 2001-01-30 Hitachi, Ltd. Knowledge based information retrieval system
US4965763A (en) 1987-03-03 1990-10-23 International Business Machines Corporation Computer method for automatic extraction of commonly specified information from business correspondence
US5265221A (en) 1989-03-20 1993-11-23 Tandem Computers Access restriction facility method and apparatus
US5187790A (en) 1989-06-29 1993-02-16 Digital Equipment Corporation Server impersonation of client processes in an object based computer operating system
US5129152A (en) 1990-12-20 1992-07-14 Hughes Aircraft Company Fast contact measuring machine
JP2943447B2 (ja) 1991-01-30 1999-08-30 三菱電機株式会社 テキスト情報抽出装置とテキスト類似照合装置とテキスト検索システムとテキスト情報抽出方法とテキスト類似照合方法、及び、質問解析装置
US5287496A (en) 1991-02-25 1994-02-15 International Business Machines Corporation Dynamic, finite versioning for concurrent transaction and query processing
US5423032A (en) 1991-10-31 1995-06-06 International Business Machines Corporation Method for extracting multi-word technical terms from text
US5685003A (en) 1992-12-23 1997-11-04 Microsoft Corporation Method and system for automatically indexing data in a document using a fresh index table
US5873097A (en) 1993-05-12 1999-02-16 Apple Computer, Inc. Update mechanism for computer storage container manager
US5638543A (en) 1993-06-03 1997-06-10 Xerox Corporation Method and apparatus for automatic document summarization
US5544352A (en) 1993-06-14 1996-08-06 Libertech, Inc. Method and apparatus for indexing, searching and displaying data
JP3547098B2 (ja) 1994-06-06 2004-07-28 トヨタ自動車株式会社 溶射方法、溶射層を摺動面とする摺動部材の製造方法、ピストンおよびピストンの製造方法
US5664172A (en) 1994-07-19 1997-09-02 Oracle Corporation Range-based query optimizer
US5903646A (en) 1994-09-02 1999-05-11 Rackman; Michael I. Access control system for litigation document production
US5574906A (en) 1994-10-24 1996-11-12 International Business Machines Corporation System and method for reducing storage requirement in backup subsystems utilizing segmented compression and differencing
US5729730A (en) 1995-03-28 1998-03-17 Dex Information Systems, Inc. Method and apparatus for improved information storage and retrieval system
US6182121B1 (en) 1995-02-03 2001-01-30 Enfish, Inc. Method and apparatus for a physical storage architecture having an improved information storage and retrieval system for a shared file environment
US5708825A (en) 1995-05-26 1998-01-13 Iconovex Corporation Automatic summary page creation and hyperlink generation
US5701469A (en) 1995-06-07 1997-12-23 Microsoft Corporation Method and system for generating accurate search results using a content-index
US5721938A (en) 1995-06-07 1998-02-24 Stuckey; Barbara K. Method and device for parsing and analyzing natural language sentences and text
US5794177A (en) 1995-07-19 1998-08-11 Inso Corporation Method and apparatus for morphological analysis and generation of natural language text
US5721939A (en) 1995-08-03 1998-02-24 Xerox Corporation Method and apparatus for tokenizing text
US6026388A (en) 1995-08-16 2000-02-15 Textwise, Llc User interface and other enhancements for natural language information retrieval system and method
US5963940A (en) 1995-08-16 1999-10-05 Syracuse University Natural language information retrieval system and method
JP3441306B2 (ja) 1995-09-12 2003-09-02 株式会社東芝 クライアント装置、メッセージ送信方法、サーバ装置、ページ処理方法及び中継サーバ装置
US5745906A (en) 1995-11-14 1998-04-28 Deltatech Research, Inc. Method and apparatus for merging delta streams to reconstruct a computer file
US5729743A (en) 1995-11-17 1998-03-17 Deltatech Research, Inc. Computer apparatus and method for merging system deltas
US5745904A (en) 1996-01-12 1998-04-28 Microsoft Corporation Buffered table user index
US5862325A (en) 1996-02-29 1999-01-19 Intermind Corporation Computer-based communication system and method using metadata defining a control structure
US5778378A (en) 1996-04-30 1998-07-07 International Business Machines Corporation Object oriented information retrieval framework mechanism
JP3108015B2 (ja) * 1996-05-22 2000-11-13 松下電器産業株式会社 ハイパーテキスト検索装置
JP3061765B2 (ja) 1996-05-23 2000-07-10 ゼロックス コーポレイション コンピュータベースの文書処理方法
US5920859A (en) * 1997-02-05 1999-07-06 Idd Enterprises, L.P. Hypertext document retrieval system and method
WO1997049048A1 (en) * 1996-06-17 1997-12-24 Idd Enterprises, L.P. Hypertext document retrieval system and method
US5909677A (en) 1996-06-18 1999-06-01 Digital Equipment Corporation Method for determining the resemblance of documents
US5832480A (en) * 1996-07-12 1998-11-03 International Business Machines Corporation Using canonical forms to develop a dictionary of names in a text
US5995980A (en) 1996-07-23 1999-11-30 Olson; Jack E. System and method for database update replication
US5832500A (en) 1996-08-09 1998-11-03 Digital Equipment Corporation Method for searching an index
US5745900A (en) 1996-08-09 1998-04-28 Digital Equipment Corporation Method for indexing duplicate database records using a full-record fingerprint
US5852820A (en) 1996-08-09 1998-12-22 Digital Equipment Corporation Method for optimizing entries for searching an index
US5745898A (en) 1996-08-09 1998-04-28 Digital Equipment Corporation Method for generating a compressed index of information of records of a database
US5765168A (en) 1996-08-09 1998-06-09 Digital Equipment Corporation Method for maintaining an index
US5797008A (en) 1996-08-09 1998-08-18 Digital Equipment Corporation Memory storing an integrated index of database records
US5765149A (en) 1996-08-09 1998-06-09 Digital Equipment Corporation Modified collection frequency ranking method
US5745889A (en) 1996-08-09 1998-04-28 Digital Equipment Corporation Method for parsing information of databases records using word-location pairs and metaword-location pairs
US5787435A (en) 1996-08-09 1998-07-28 Digital Equipment Corporation Method for mapping an index of a database into an array of files
US5864863A (en) 1996-08-09 1999-01-26 Digital Equipment Corporation Method for parsing, indexing and searching world-wide-web pages
US5745894A (en) 1996-08-09 1998-04-28 Digital Equipment Corporation Method for generating and searching a range-based index of word-locations
US5765158A (en) 1996-08-09 1998-06-09 Digital Equipment Corporation Method for sampling a compressed index to create a summarized index
US5809502A (en) 1996-08-09 1998-09-15 Digital Equipment Corporation Object-oriented interface for an index
US5745890A (en) 1996-08-09 1998-04-28 Digital Equipment Corporation Sequential searching of a database index using constraints on word-location pairs
US5765150A (en) 1996-08-09 1998-06-09 Digital Equipment Corporation Method for statistically projecting the ranking of information
US5745899A (en) 1996-08-09 1998-04-28 Digital Equipment Corporation Method for indexing information of a database
US5724033A (en) 1996-08-09 1998-03-03 Digital Equipment Corporation Method for encoding delta values
JP2001505330A (ja) 1996-08-22 2001-04-17 ルノー・アンド・オスピー・スピーチ・プロダクツ・ナームローゼ・ベンノートシャープ テキストストリーム中の単語の切れ目を与える方法及び装置
US5924091A (en) 1996-08-28 1999-07-13 Sybase, Inc. Database system with improved methods for radix sorting
US6078914A (en) * 1996-12-09 2000-06-20 Open Text Corporation Natural language meta-search system and method
US6285999B1 (en) 1997-01-10 2001-09-04 The Board Of Trustees Of The Leland Stanford Junior University Method for node ranking in a linked database
JP3579204B2 (ja) 1997-01-17 2004-10-20 富士通株式会社 文書要約装置およびその方法
US5903891A (en) 1997-02-25 1999-05-11 Hewlett-Packard Company Hierarchial information processes that share intermediate data and formulate contract data
US6278992B1 (en) 1997-03-19 2001-08-21 John Andrew Curtis Search engine using indexing method for storing and retrieving data
JP4243344B2 (ja) * 1997-05-23 2009-03-25 株式会社Access 移動通信機器
US5884305A (en) 1997-06-13 1999-03-16 International Business Machines Corporation System and method for data mining from relational data by sieving through iterated relational reinforcement
EP0884688A3 (en) 1997-06-16 2005-06-22 Koninklijke Philips Electronics N.V. Sparse index search method
US5933822A (en) 1997-07-22 1999-08-03 Microsoft Corporation Apparatus and methods for an information retrieval system that employs natural language processing of search results to improve overall precision
US6078916A (en) * 1997-08-01 2000-06-20 Culliss; Gary Method for organizing information
US6026413A (en) 1997-08-01 2000-02-15 International Business Machines Corporation Determining how changes to underlying data affect cached objects
US7031954B1 (en) 1997-09-10 2006-04-18 Google, Inc. Document retrieval system with access control
US5974412A (en) 1997-09-24 1999-10-26 Sapient Health Network Intelligent query system for automatically indexing information in a database and automatically categorizing users
US6594682B2 (en) 1997-10-28 2003-07-15 Microsoft Corporation Client-side system for scheduling delivery of web content and locally managing the web content
US6061678A (en) 1997-10-31 2000-05-09 Oracle Corporation Approach for managing access to large objects in database systems using large object indexes
US6029165A (en) 1997-11-12 2000-02-22 Arthur Andersen Llp Search and retrieval information system and method
KR100285265B1 (ko) 1998-02-25 2001-04-02 윤덕용 데이터 베이스 관리 시스템과 정보 검색의 밀결합을 위하여 서브 인덱스와 대용량 객체를 이용한 역 인덱스 저장 구조
US6005503A (en) 1998-02-27 1999-12-21 Digital Equipment Corporation Method for encoding and decoding a list of variable size integers to reduce branch mispredicts
US6016501A (en) 1998-03-18 2000-01-18 Bmc Software Enterprise data movement system and method which performs data load and changed data propagation operations
US6119124A (en) 1998-03-26 2000-09-12 Digital Equipment Corporation Method for clustering closely resembling data objects
US6088694A (en) 1998-03-31 2000-07-11 International Business Machines Corporation Continuous availability and efficient backup for externally referenced objects
US6374268B1 (en) 1998-04-14 2002-04-16 Hewlett-Packard Company Methods and systems for an incremental file system
US6192333B1 (en) 1998-05-12 2001-02-20 Microsoft Corporation System for creating a dictionary
US6212522B1 (en) 1998-05-15 2001-04-03 International Business Machines Corporation Searching and conditionally serving bookmark sets based on keywords
US6205451B1 (en) 1998-05-22 2001-03-20 Oracle Corporation Method and apparatus for incremental refresh of summary tables in a database system
AU4196299A (en) 1998-05-23 1999-12-13 Eolas Technologies, Incorporated Identification of features of multi-dimensional image data in hypermedia systems
US6216175B1 (en) 1998-06-08 2001-04-10 Microsoft Corporation Method for upgrading copies of an original file with same update data after normalizing differences between copies created during respective original installations
US7024623B2 (en) 1998-06-17 2006-04-04 Microsoft Corporation Method and system for placing an insertion point in an electronic document
EP0981099A3 (en) 1998-08-17 2004-04-21 Connected Place Limited A method of and an apparatus for merging a sequence of delta files
US6243713B1 (en) 1998-08-24 2001-06-05 Excalibur Technologies Corp. Multimedia document retrieval by application of multimedia queries to a unified index of multimedia data for a plurality of multimedia data types
US6334131B2 (en) * 1998-08-29 2001-12-25 International Business Machines Corporation Method for cataloging, filtering, and relevance ranking frame-based hierarchical information structures
GB9818819D0 (en) 1998-08-29 1998-10-21 Int Computers Ltd Time-versioned data storage mechanism
US6308179B1 (en) 1998-08-31 2001-10-23 Xerox Corporation User level controlled mechanism inter-positioned in a read/write path of a property-based document management system
US6553385B2 (en) 1998-09-01 2003-04-22 International Business Machines Corporation Architecture of a framework for information extraction from natural language documents
US6519597B1 (en) 1998-10-08 2003-02-11 International Business Machines Corporation Method and apparatus for indexing structured documents with rich data types
US6336122B1 (en) * 1998-10-15 2002-01-01 International Business Machines Corporation Object oriented class archive file maker and method
US6519593B1 (en) 1998-12-15 2003-02-11 Yossi Matias Efficient bundle sorting
CA2256934C (en) 1998-12-23 2002-04-02 Hamid Bacha System for electronic repository of data enforcing access control on data retrieval
US6295529B1 (en) 1998-12-24 2001-09-25 Microsoft Corporation Method and apparatus for indentifying clauses having predetermined characteristics indicative of usefulness in determining relationships between different texts
US6381602B1 (en) 1999-01-26 2002-04-30 Microsoft Corporation Enforcing access control on resources at a location other than the source location
US6418433B1 (en) 1999-01-28 2002-07-09 International Business Machines Corporation System and method for focussed web crawling
US6584458B1 (en) 1999-02-19 2003-06-24 Novell, Inc. Method and apparatuses for creating a full text index accommodating child words
US6438535B1 (en) 1999-03-18 2002-08-20 Lockheed Martin Corporation Relational database method for accessing information useful for the manufacture of, to interconnect nodes in, to repair and to maintain product and system units
US6631496B1 (en) * 1999-03-22 2003-10-07 Nec Corporation System for personalizing, organizing and managing web information
US6393415B1 (en) 1999-03-31 2002-05-21 Verizon Laboratories Inc. Adaptive partitioning techniques in performing query requests and request routing
US6336117B1 (en) 1999-04-30 2002-01-01 International Business Machines Corporation Content-indexing search system and method providing search results consistent with content filtering and blocking policies implemented in a blocking engine
US6269361B1 (en) 1999-05-28 2001-07-31 Goto.Com System and method for influencing a position on a search result list generated by a computer network search engine
JP2000339309A (ja) 1999-05-31 2000-12-08 Sony Corp 文字列解析装置、文字列解析方法及び提供媒体
US7472349B1 (en) 1999-06-01 2008-12-30 Oracle International Corporation Dynamic services infrastructure for allowing programmatic access to internet and other resources
US6421655B1 (en) 1999-06-04 2002-07-16 Microsoft Corporation Computer-based representations and reasoning methods for engaging users in goal-oriented conversations
US6631369B1 (en) * 1999-06-30 2003-10-07 Microsoft Corporation Method and system for incremental web crawling
US6547829B1 (en) 1999-06-30 2003-04-15 Microsoft Corporation Method and system for detecting duplicate documents in web crawls
US6339772B1 (en) 1999-07-06 2002-01-15 Compaq Computer Corporation System and method for performing database operations on a continuous stream of tuples
US6463439B1 (en) 1999-07-15 2002-10-08 American Management Systems, Incorporated System for accessing database tables mapped into memory for high performance data retrieval
US7065784B2 (en) 1999-07-26 2006-06-20 Microsoft Corporation Systems and methods for integrating access control with a namespace
US6587458B1 (en) * 1999-08-04 2003-07-01 At&T Corporation Method and apparatus for an internet Caller-ID delivery plus service
US6754873B1 (en) * 1999-09-20 2004-06-22 Google Inc. Techniques for finding related hyperlinked documents using link-based analysis
US8914361B2 (en) * 1999-09-22 2014-12-16 Google Inc. Methods and systems for determining a meaning of a document to match the document to content
US6665666B1 (en) * 1999-10-26 2003-12-16 International Business Machines Corporation System, method and program product for answering questions using a search engine
JP2001134575A (ja) 1999-10-29 2001-05-18 Internatl Business Mach Corp <Ibm> 頻出パターン検出方法およびシステム
US6507846B1 (en) 1999-11-09 2003-01-14 Joint Technology Corporation Indexing databases for efficient relational querying
US6665657B1 (en) * 1999-11-19 2003-12-16 Niku Corporation Method and system for cross browsing of various multimedia data sources in a searchable repository
US6839702B1 (en) 1999-12-15 2005-01-04 Google Inc. Systems and methods for highlighting search results
US6725214B2 (en) 2000-01-14 2004-04-20 Dotnsf Apparatus and method to support management of uniform resource locators and/or contents of database servers
US6678409B1 (en) * 2000-01-14 2004-01-13 Microsoft Corporation Parameterized word segmentation of unsegmented text
US6615209B1 (en) 2000-02-22 2003-09-02 Google, Inc. Detecting query-specific duplicate documents
US20020032677A1 (en) 2000-03-01 2002-03-14 Jeff Morgenthaler Methods for creating, editing, and updating searchable graphical database and databases of graphical images and information and displaying graphical images from a searchable graphical database or databases in a sequential or slide show format
US6985948B2 (en) * 2000-03-29 2006-01-10 Fujitsu Limited User's right information and keywords input based search query generating means method and apparatus for searching a file
US6658406B1 (en) * 2000-03-29 2003-12-02 Microsoft Corporation Method for selecting terms from vocabularies in a category-based system
FR2807537B1 (fr) 2000-04-06 2003-10-17 France Telecom Moteur de recherche de ressources hypermedia et procede d'indexation associe
US7173912B2 (en) 2000-05-05 2007-02-06 Fujitsu Limited Method and system for modeling and advertising asymmetric topology of a node in a transport network
US6850979B1 (en) 2000-05-09 2005-02-01 Sun Microsystems, Inc. Message gates in a distributed computing environment
US6643650B1 (en) * 2000-05-09 2003-11-04 Sun Microsystems, Inc. Mechanism and apparatus for using messages to look up documents stored in spaces in a distributed computing environment
US6868447B1 (en) 2000-05-09 2005-03-15 Sun Microsystems, Inc. Mechanism and apparatus for returning results of services in a distributed computing environment
US6789077B1 (en) 2000-05-09 2004-09-07 Sun Microsystems, Inc. Mechanism and apparatus for web-based searching of URI-addressable repositories in a distributed computing environment
SE517005C2 (sv) 2000-05-31 2002-04-02 Hapax Information Systems Ab Segmentering av text
US20010049671A1 (en) 2000-06-05 2001-12-06 Joerg Werner B. e-Stract: a process for knowledge-based retrieval of electronic information
SE517496C2 (sv) 2000-06-22 2002-06-11 Hapax Information Systems Ab Metod och system för informationsextrahering
US6839665B1 (en) 2000-06-27 2005-01-04 Text Analysis International, Inc. Automated generation of text analysis systems
US6567804B1 (en) 2000-06-27 2003-05-20 Ncr Corporation Shared computation of user-defined metrics in an on-line analytic processing system
US6578032B1 (en) 2000-06-28 2003-06-10 Microsoft Corporation Method and system for performing phrase/word clustering and cluster merging
US6865575B1 (en) 2000-07-06 2005-03-08 Google, Inc. Methods and apparatus for using a modified index to provide search results in response to an ambiguous search query
US20030217052A1 (en) 2000-08-24 2003-11-20 Celebros Ltd. Search engine method and apparatus
US6701317B1 (en) 2000-09-19 2004-03-02 Overture Services, Inc. Web page connectivity server construction
JP4649731B2 (ja) * 2000-11-27 2011-03-16 日本電気株式会社 文書要約システム及び文書要約方法
US6633872B2 (en) * 2000-12-18 2003-10-14 International Business Machines Corporation Extendible access control for lightweight directory access protocol
US20030028564A1 (en) 2000-12-19 2003-02-06 Lingomotors, Inc. Natural language method and system for matching and ranking documents in terms of semantic relatedness
US6907423B2 (en) * 2001-01-04 2005-06-14 Sun Microsystems, Inc. Search engine interface and method of controlling client searches
US7356530B2 (en) 2001-01-10 2008-04-08 Looksmart, Ltd. Systems and methods of retrieving relevant information
US6766316B2 (en) 2001-01-18 2004-07-20 Science Applications International Corporation Method and system of ranking and clustering for document indexing and retrieval
US6658423B1 (en) * 2001-01-24 2003-12-02 Google, Inc. Detecting duplicate and near-duplicate files
US20020165707A1 (en) 2001-02-26 2002-11-07 Call Charles G. Methods and apparatus for storing and processing natural language text data as a sequence of fixed length integers
SE520533C2 (sv) 2001-03-13 2003-07-22 Picsearch Ab Metod, datorprogram och system för indexering av digitaliserade enheter
US6904454B2 (en) * 2001-03-21 2005-06-07 Nokia Corporation Method and apparatus for content repository with versioning and data modeling
US7509492B2 (en) 2001-03-27 2009-03-24 Microsoft Corporation Distributed scalable cryptographic access control
US6990634B2 (en) 2001-04-27 2006-01-24 The United States Of America As Represented By The National Security Agency Method of summarizing text by sentence extraction
US20020169770A1 (en) * 2001-04-27 2002-11-14 Kim Brian Seong-Gon Apparatus and method that categorize a collection of documents into a hierarchy of categories that are defined by the collection of documents
US6999971B2 (en) 2001-05-08 2006-02-14 Verity, Inc. Apparatus and method for parametric group processing
US7299219B2 (en) 2001-05-08 2007-11-20 The Johns Hopkins University High refresh-rate retrieval of freshly published content using distributed crawling
US20030046311A1 (en) 2001-06-19 2003-03-06 Ryan Baidya Dynamic search engine and database
US6622211B2 (en) * 2001-08-15 2003-09-16 Ip-First, L.L.C. Virtual set cache that redirects store data to correct virtual set to avoid virtual set store miss penalty
JP3557605B2 (ja) 2001-09-19 2004-08-25 インターナショナル・ビジネス・マシーンズ・コーポレーション 文切り方法及びこれを用いた文切り処理装置、機械翻訳装置並びにプログラム
US6877136B2 (en) 2001-10-26 2005-04-05 United Services Automobile Association (Usaa) System and method of providing electronic access to one or more documents
US6763362B2 (en) * 2001-11-30 2004-07-13 Micron Technology, Inc. Method and system for updating a search engine
US7249034B2 (en) 2002-01-14 2007-07-24 International Business Machines Corporation System and method for publishing a person's affinities
US6829606B2 (en) * 2002-02-14 2004-12-07 Infoglide Software Corporation Similarity search engine for use with relational databases
US7949648B2 (en) 2002-02-26 2011-05-24 Soren Alain Mortensen Compiling and accessing subject-specific information from a computer network
US7243301B2 (en) 2002-04-10 2007-07-10 Microsoft Corporation Common annotation framework
US20030225763A1 (en) 2002-04-15 2003-12-04 Microsoft Corporation Self-improving system and method for classifying pages on the world wide web
US7080091B2 (en) 2002-05-09 2006-07-18 Oracle International Corporation Inverted index system and method for numeric attributes
US7096208B2 (en) 2002-06-10 2006-08-22 Microsoft Corporation Large margin perceptrons for document categorization
US20040128615A1 (en) 2002-12-27 2004-07-01 International Business Machines Corporation Indexing and querying semi-structured documents
US7051023B2 (en) * 2003-04-04 2006-05-23 Yahoo! Inc. Systems and methods for generating concept units from search queries
US7197497B2 (en) * 2003-04-25 2007-03-27 Overture Services, Inc. Method and apparatus for machine learning a document relevance function
US7516146B2 (en) 2003-05-15 2009-04-07 Microsoft Corporation Fast adaptive document filtering
US7139752B2 (en) 2003-05-30 2006-11-21 International Business Machines Corporation System, method and computer program product for performing unstructured information management and automatic text analysis, and providing multiple document views derived from different document tokenizations
US20040243560A1 (en) 2003-05-30 2004-12-02 International Business Machines Corporation System, method and computer program product for performing unstructured information management and automatic text analysis, including an annotation inverted file system facilitating indexing and searching
US7146361B2 (en) 2003-05-30 2006-12-05 International Business Machines Corporation System, method and computer program product for performing unstructured information management and automatic text analysis, including a search operator functioning as a Weighted AND (WAND)
US20040243556A1 (en) 2003-05-30 2004-12-02 International Business Machines Corporation System, method and computer program product for performing unstructured information management and automatic text analysis, and including a document common analysis system (CAS)
US20040243554A1 (en) 2003-05-30 2004-12-02 International Business Machines Corporation System, method and computer program product for performing unstructured information management and automatic text analysis
US7188254B2 (en) 2003-08-20 2007-03-06 Microsoft Corporation Peer-to-peer authorization method
US6934634B1 (en) 2003-09-22 2005-08-23 Google Inc. Address geocoding
US6906920B1 (en) 2003-09-29 2005-06-14 Google Inc. Drive cooling baffle
US6870095B1 (en) 2003-09-29 2005-03-22 Google Inc. Cable management for rack mounted computing system
US6845009B1 (en) 2003-09-30 2005-01-18 Google Inc. Cooling baffle and fan mount apparatus
US7849063B2 (en) 2003-10-17 2010-12-07 Yahoo! Inc. Systems and methods for indexing content for fast and scalable retrieval
US7620624B2 (en) 2003-10-17 2009-11-17 Yahoo! Inc. Systems and methods for indexing content for fast and scalable retrieval
US20050144241A1 (en) 2003-10-17 2005-06-30 Stata Raymond P. Systems and methods for a search-based email client
US7693824B1 (en) 2003-10-20 2010-04-06 Google Inc. Number-range search system and method
US20050149499A1 (en) 2003-12-30 2005-07-07 Google Inc., A Delaware Corporation Systems and methods for improving search quality
US8150824B2 (en) 2003-12-31 2012-04-03 Google Inc. Systems and methods for direct navigation to specific portion of target document
US20050149851A1 (en) 2003-12-31 2005-07-07 Google Inc. Generating hyperlinks and anchor text in HTML and non-HTML documents
US7424467B2 (en) 2004-01-26 2008-09-09 International Business Machines Corporation Architecture for an indexer with fixed width sort and variable width sort
US7499913B2 (en) 2004-01-26 2009-03-03 International Business Machines Corporation Method for handling anchor text
US8296304B2 (en) 2004-01-26 2012-10-23 International Business Machines Corporation Method, system, and program for handling redirects in a search engine
US7293005B2 (en) 2004-01-26 2007-11-06 International Business Machines Corporation Pipelined architecture for global analysis and index building
US7318075B2 (en) 2004-02-06 2008-01-08 Microsoft Corporation Enhanced tabular data stream protocol
US8688143B2 (en) 2004-08-24 2014-04-01 Qualcomm Incorporated Location based service (LBS) system and method for creating a social network
US7461064B2 (en) 2004-09-24 2008-12-02 International Buiness Machines Corporation Method for searching documents for ranges of numeric values
US20060129538A1 (en) 2004-12-14 2006-06-15 Andrea Baader Text search quality by exploiting organizational information
US8417693B2 (en) 2005-07-14 2013-04-09 International Business Machines Corporation Enforcing native access control to indexed documents
US7840542B2 (en) 2006-02-06 2010-11-23 International Business Machines Corporation Method and system for controlling access to semantic web statements

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107132963A (zh) * 2017-05-08 2017-09-05 深圳乐信软件技术有限公司 红点消息显示方法、消去方法以及相应装置
CN107132963B (zh) * 2017-05-08 2020-09-08 深圳乐信软件技术有限公司 红点消息显示方法、消去方法以及相应装置
CN111625615A (zh) * 2019-02-27 2020-09-04 国际商业机器公司 文字提取与处理
CN111625615B (zh) * 2019-02-27 2023-08-01 国际商业机器公司 用于处理文本数据的方法和系统

Also Published As

Publication number Publication date
JP2007519111A (ja) 2007-07-12
US8285724B2 (en) 2012-10-09
US20050165781A1 (en) 2005-07-28
US7499913B2 (en) 2009-03-03
WO2005071566A1 (en) 2005-08-04
EP1714223A1 (en) 2006-10-25
US20090083270A1 (en) 2009-03-26

Similar Documents

Publication Publication Date Title
CN1906614A (zh) 用于处理锚点文本的方法、系统与程序
AU2010343183B2 (en) Search suggestion clustering and presentation
JP5492187B2 (ja) 編集距離および文書情報を使用する検索結果順位付け
Adar et al. The web changes everything: understanding the dynamics of web content
US7783626B2 (en) Pipelined architecture for global analysis and index building
Guerbas et al. Effective web log mining and online navigational pattern prediction
US9092510B1 (en) Modifying search result ranking based on a temporal element of user feedback
US8898150B1 (en) Collecting image search event information
US20170068740A1 (en) Method and system for web searching
US20070100797A1 (en) Indication of exclusive items in a result set
US20090287676A1 (en) Search results with word or phrase index
KR20070098521A (ko) 웹 크롤링 프로세스 동안 웹 사이트에 우선순위를 부여하기위한 시스템 및 방법
US20090303238A1 (en) Identifying on a graphical depiction candidate points and top-moving queries
WO2012051470A1 (en) Systems and methods for using a behavior history of a user to augment content of a webpage
KR20110009198A (ko) 최다 클릭된 다음 객체들을 갖는 검색 결과
US20100179953A1 (en) Information presentation system, information presentation method, and program for information presentation
EP1993045A1 (en) Electronic document retrievel system
EP1975816A1 (en) Electronic document retrieval system
US20050165800A1 (en) Method, system, and program for handling redirects in a search engine
US20020152242A1 (en) System for monitoring the usage of intranet portal modules
Sharma et al. Web search result optimization by mining the search engine query logs
KR100667917B1 (ko) 웹사이트 검색 서비스 제공 방법 및 그 시스템
JP2003271648A (ja) 検索装置、検索方法、ならびに、プログラム
KR100942902B1 (ko) 웹페이지 검색 방법 및 상기 방법을 컴퓨터에서 구현하는 프로그램을 기록한 컴퓨터 판독 가능한 기록 매체
Pandian et al. A Unified Model for Preprocessing and Clustering Technique for Web Usage Mining.

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication