CN1906614A - 用于处理锚点文本的方法、系统与程序 - Google Patents
用于处理锚点文本的方法、系统与程序 Download PDFInfo
- Publication number
- CN1906614A CN1906614A CNA2005800018061A CN200580001806A CN1906614A CN 1906614 A CN1906614 A CN 1906614A CN A2005800018061 A CNA2005800018061 A CN A2005800018061A CN 200580001806 A CN200580001806 A CN 200580001806A CN 1906614 A CN1906614 A CN 1906614A
- Authority
- CN
- China
- Prior art keywords
- document
- information
- anchor point
- logic
- anchor
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/955—Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
- G06F16/9558—Details of hyperlinks; Management of linked annotations
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10—TECHNICAL SUBJECTS COVERED BY FORMER USPC
- Y10S—TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10S707/00—Data processing: database and file management or data structures
- Y10S707/99931—Database or file accessing
- Y10S707/99933—Query processing, i.e. searching
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10—TECHNICAL SUBJECTS COVERED BY FORMER USPC
- Y10S—TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10S707/00—Data processing: database and file management or data structures
- Y10S707/99931—Database or file accessing
- Y10S707/99933—Query processing, i.e. searching
- Y10S707/99935—Query augmenting and refining, e.g. inexact access
Abstract
所公开的是用于处理用于信息检索的锚点文本的方法、系统与程序。形成指向目标文档的锚点集合。具有相同锚点文本的锚点分组到一起。为每个组计算信息。基于所计算出的信息为目标文档生成上下文信息。
Description
技术领域
本发明涉及处理用于信息检索的锚点文本。
背景技术
万维网(也称为WWW或“web”)是支持可以包括对其它web页面的链接的web页面的一些因特网服务器的集合。统一资源定位器(URL)指示web页面的位置。而且,每个web页面可以包含例如文本、图片、音频和/或视频内容。例如,第一web页面可以包含对第二web页面的链接。当在第一web页面中选择该链接时,一般会显示第二web页面。
web浏览器是用于定位并显示web页面的软件应用程序。目前,在web上有几十亿的web页面。
web搜索引擎用于根据(例如,通过web浏览器输入的)某种标准检索web上的web页面。即,web搜索引擎设计成返回在给定关键字查询的情况下的相关web页面。例如,对公司内联网搜索引擎发出的查询“HR”被期望返回该内联网中关于人力资源(HR)的相关页面。web搜索引擎使用将搜索项(例如,关键字)与web页面关联的索引技术。
锚点可以描述为对文档的链接或路径(例如,URL)。锚点文本可以描述为与指向文档的路径或链接(例如,URL)关联的文本。例如,锚点文本可以是标记或包含web文档中超文本链接的文本。锚点文本由web搜索引擎收集并与目标文档关联。而且,锚点文本与目标文档一起进行索引。
web搜索引擎使用上下文信息(例如,标题、摘要、语言等)丰富搜索结果。这向用户提供了过滤的搜索结果。但是,锚点文本可以与用作上下文信息不相关。例如,锚点文本可以是与目标文档不同的语言,而且未进一步处理的锚点文本的使用可能导致例如用于英文文档的日文标题。此外,锚点文本可以与文档的内容不相关。例如,锚点文本可以包含经常出现且主要用于导航但作为标题没有任何有意义价值的通用词汇(例如,“点击这里”)。而且,锚点文本可以不准确、不礼貌或者可以包含俚语(例如,对“网络安全指南”的锚点具有“在找麻烦吗?”的锚点文本)。
此外,当web页面的内容不能被检索(例如,由于服务器断电、用于由搜索引擎处理的web页面检索的不完整、robots.txt禁止访问)或者当文档被检索但不能被分析(例如,因为文件是视频/音频/多媒体文件、是未知或不支持的格式、形式不好或者是口令保护的)时,上下文信息的生成非常困难。
大多数搜索引擎只显示统一资源定位器(URL),而没有web页面的内容。但是,这使得用户如果不看web页面本身就很难捕捉到搜索结果的用途。
因此,需要改进的文档处理来提供用于如web页面的文档的上下文信息。
发明内容
提供了用于处理锚点文本的方法、系统与程序。形成指向目标文档的锚点集合。具有相同锚点文本的锚点分组到一起。为每个组计算信息。基于所计算出的信息为目标文档生成上下文信息。
附图说明
现在参考附图,其中相同的标号始终表示对应的部件:
图1以方框图说明了根据本发明的特定实现的计算环境。
图2说明了根据本发明的特定实现的实现成准备用于处理的锚点的逻辑。
图3A和3B说明了根据本发明的特定实现的实现成处理锚点文本的逻辑。
图4说明了根据本发明的特定实现的用于执行文档搜索的逻辑。
图5说明了根据本发明的特定实现可以使用的计算机系统的体系结构。
具体实施方式
在以下描述中,参考构成本发明的一部分并说明了本发明几种实现的附图。应当理解,在不背离本发明范围的情况下,其它实现也可以使用,而且可以进行结构上和操作上的改变。
通过代替或除内容之外还索引锚点文本,本发明的特定实现使文档可以用于搜索。特定实现根据指向文档的锚点的锚点文本生成上下文信息。例如,至少一部分锚点文本可以指定为文档的标题或摘要。但是,由于锚点文本可能是与目标文档不同的语言、锚点文本可能与文档内容不相关或者锚点文本可能不准确、不礼貌或可能包含俚语,因此可能难以识别有意义的锚点文本。此外,要特别注意除去锚点文本中作为例如标题不具有有意义价值的噪声(例如,如“下一列”的URL、文件名、导航文本)。
因此,本发明的特定实现处理原始锚点文本,以获得高质量的标题和摘要。本发明的特定实现提炼原始锚点文本,以获得可以用于为搜索结果项生成标题或摘要数据的高质量数据。原始锚点文本处理的结果提高了整体搜索质量,因此改善了用户在文档检索系统中的体验。
图1以方框图说明了根据本发明的特定实现的计算环境。客户端计算机100通过网络190连接到服务器计算机120。客户端计算机100可以包括本领域已知的任何计算设备,如服务器、大型机、工作站、个人计算机、手持式计算机、膝上型电话设备、网络工具等。网络190可以包括任何类型的网络,例如存储区域网(SAN)、局域网(LAN)、广域网(WAN)、因特网、内联网等。客户端计算机100包括可以在易失和/或非易失设备中实现的系统存储器104。一个或多个客户端应用程序110及阅读器应用程序112可以在系统存储器104中执行。阅读器应用程序112提供启用(例如,存储在一个或多个数据存储器170中的)一组文档搜索的接口。在特定实现中,阅读器应用程序112是web浏览器。
服务器计算机120包括可以在易失和/或非易失设备中实现的系统存储器122。搜索引擎130在系统存储器122中执行。在特定实现中,搜索引擎包括爬行(crawler)组件132、静态分级组件134、文档分析组件136、复制检测组件138、锚点文本组件140及索引组件142。锚点文本组件140包括上下文信息生成器141。尽管组件132、134、136、138、140、141及142说明为独立组件,但组件132、134、136、138、140、141及142的功能性可以在比所说明的更少或更多或不同组件中实现。此外,组件132、134、136、138、140、141及142的功能性可以在web应用服务器计算机或连接到服务器计算机120的其它服务器计算机中实现。此外,一个或多个服务器应用程序160在系统存储器122中执行。
服务器计算机120向客户端计算机100提供对至少一个数据存储器170(例如,数据库)中的数据的访问。尽管为了方便理解而说明了单个数据存储器170,但数据存储器170中的数据可以存储在连接到服务器计算机120的其它计算机的数据存储器中。
而且,操作员控制台180执行一个或多个应用程序182并用于访问服务器计算机120和数据存储器170。
数据存储器170可以包括如直接存取存储设备(DASD)、简单磁盘捆绑(JBOD)、冗余独立磁盘阵列(RAID)、虚拟设备等存储设备的阵列。数据存储器170包括与本发明特定实现一起使用的数据。
图2说明了根据本发明的特定实现的实现成准备用于处理的锚点的逻辑。控制在块200开始,其中锚点文本与各个锚点关联。这可以由例如创建锚点的各个用户完成。锚点可以描述为从源文档到目标文档的路径或链接(例如,URL)。
在块202,获得要由搜索引擎130索引的文档。在特定实现中,文档被发布或推到(例如,就象关于报纸的情况一样)索引组件142。在特定实现中,爬行组件132发现、提取并存储文档。在特定实现中,爬行组件132可以基于例如特定标准(例如,在最近一个月访问的文档)发现文档。此外,爬行组件132可发现直接(例如,数据存储器170)或间接(例如,通过其它计算设备(未示出)连接到服务器计算机120)连接到服务器计算机120的一个或多个数据存储器中的文档。在特定实现中,爬行组件132发现、提取并在数据存储器170中存储web页面。这些存储的文档可以称为“文档集合”。
在块204,文档分析组件136执行每文档分析。特别地,文档分析组件136评审所存储的文档、解析并标记文档、并对每个文档确定每个文档书写所用的语言、提取锚点文本并执行如文档分类及分级的其它任务。语言信息的存储是为了以后使用。例如,文档分析组件136确定文档中所使用的主要语言是英文、日文、还是德文等。作为提取锚点文本的一部分,文档分析组件136还将接近的类与每一锚点关联。接近类可以描述为指定源文档与目标文档有多接近(例如,它们是否在相同的服务器上,如果是,那么它们是否在相同的目录中)。而且,提取出的锚点文本准备好由另一锚点文本组件140处理。
在块206,静态分级组件134评审所存储的文档并向文档分配级别。级别可以描述为源文档相对于由爬行组件132已存储的其它文档的重要性。任何类型的分级技术都可以使用。例如,较频繁访问的文档可以接收较高的级别。
在块208,上下文信息生成器141按目标文档分类锚点。这导致用于目标文档的锚点集合一起分到一个组中,作进一步处理。就象将参考图3A和3B所描述的,每个组对于每个目标文档单独处理。
图3A和3B说明了根据本发明的特定实现的实现成处理锚点文本的逻辑。控制在块300开始,其中上下文信息生成器141确定用于目标文档的锚点集合中指向目标文档的锚点的源文档的主要语言。在特定实现中,如果多于可配置百分比的源文档具有相同的语言,则集合中源文档语言不同于主要语言的锚点被除去。可配置百分比可以描述为可以由例如系统管理员或其它应用程序修改的百分比。
在块302,上下文信息生成器141除去具有包含到目标文档的路径(例如,URL)或路径的一部分的锚点文本的锚点。在块304,基于锚点文本是否及以什么次序或组合包含来自可配置单词集合的单词,上下文信息生成器141除去锚点文本(例如,可除去只包含来自可配置集合的单词的锚点文本、包含至少多个来自可配置集合的单词或以特定顺序包含来自可配置集合的单词的锚点文本)。可配置单词集合可以例如由系统管理员确定。例如,可配置单词集合可以包括无用单词,如“点击这里”或“该”。
在块306,上下文信息生成器141按锚点文本分类锚点集合并将具有相同锚点文本的锚点分组到一起。在块308,上下文信息生成器141为每个组计算加权的文本出现次数总和。文本每次单独出现的权值可以由锚点的接近类确定。例如,如果第一文档具有接近类A,第二文档具有接近类B,而第三文档具有接近类C,且类A、B和C分别具有权值10、5和2,则加权总和为17。
在块310,上下文信息生成器141为每个组计算累计级别。即,根据其源文档与其接近类的级别,组中的每个锚点都对这个级别起作用。例如,如果第一文档具有接近类A,第二文档具有接近类B,而第三文档具有接近类C,且类A、B和C分别具有权值10、5和2,则如果第一、第二和第三文档分别具有静态级别9、13和16,而且如果累计级别由加权平均值计算,则累计级别是(9*10+13*5+16*2)/(10+5+2)=187/17=11。计算累计级别的其它技术包括最小值、最大值或者这二者结合并让一个接近类的级别相对于其它接近类的级别优先等。
在块312,上下文信息生成器141为每个组计算语言得分。在特定实现中,这种得分可以通过对文本可作为标题显示的能力进行打分的文本语言分析计算。例如,作为标题显示的能力可以通过考虑文本中单词的个数(例如,标题应当是简短的)、进一步的文本语言分析、每个单词或指向目标文档的锚点集合中所有锚点中单词出现次数或者当目标文档可以访问时锚点与目标文档内容相似性的统计分析来确定。
在块314,上下文信息生成器141根据出现次数的加权总和、累计静态级别和语言得分为每个组计算组合相关性得分。
在块316,上下文信息生成器141为目标文档生成上下文信息。在特定实现中,上下文信息生成器141选择具有最高组合相关性得分的组的文本作为伪标题、从具有最高相关性得分的n组的锚点文本组成用于目标文档的基于锚点的静态摘要并从主要源语言推断语言T。
一旦完成锚点文本处理,索引组件142就利用处理后的锚点文本生成索引。
图4说明了根据本发明的特定实现的用于执行文档搜索的逻辑。控制在块400开始,其中用户通过阅读器应用程序112提交搜索请求。在块402,搜索引擎130执行该搜索请求。在块404,搜索引擎返回包括锚点文本处理及图2A和2B所述的其它处理的搜索结果。在块406,阅读器应用程序112显示搜索结果。
因此,本发明的特定实现提供了根据锚点集合为搜索结果项生成高质量上下文信息的技术。在特定实现中,执行对每个文档的分析以便识别文档书写所用的语言、执行所有文档的全局分析以便为每个文档分配静态级别、而且锚点按目标文档分类以便为每个目标文档获得指向该目标文档的所有锚点的逻辑集合。对于指向目标文档的每个锚点集合,可以执行以下处理:分析源文档语言的分布;基于语言分布剪除来自集合的锚点;基于无用单词和URL检测的噪声过滤;根据源与目标的接近性分类每个锚点;及向每个接近类分配权值。此外,每个锚点可以被根据锚点的锚点文本的语言分析打分。此外,剩余唯一锚点文本的相关性排序(即,相同的文本可以在不同的锚点上)可以基于每个接近类中出现次数的加权总和、所有源文档的累计级别和文本的语言得分执行。
锚点文本处理的结果是高质量标题、摘要及其它上下文信息(例如,对每个目标最有可能的语言)。对于目标文档不可用的搜索结果,这种上下文信息可以显示给用户。如果目标文档本身可用,则所生成的上下文信息可以用于丰富从目标文档获得的信息(例如,通过找文档及其锚点之间的相似性)。
所描述的用于处理锚点文本的技术可以利用标准编程和/或工程技术以产生软件、固件、硬件或其任意组合来实现为方法、装置或制造物。在此所使用的术语“制造物”指在硬件逻辑(例如,集成电路芯片、可编程门阵列(PGA)、专用集成电路(ASIC)等)或者如磁存储介质(例如,硬盘驱动器、软盘、磁带等)、光存储器(CD-ROM、光盘等)、易失和非易失存储器设备(例如,EEPROM、ROM、PROM、RAM、DRAM、SRAM、固件、可编程逻辑等)的计算机可读介质中实现的代码或逻辑。计算机可读介质中的代码由处理器访问并执行。各种实现都可以通过其实现的代码还可以通过传输介质或通过网络从文件服务器访问。在这种情况下,其中代码执行的制造物可以包括如网络传输线、无线传输介质、通过空间传播的信号、无线电波、红外线信号等的传输介质。因此,“制造物”可以包括代码在其中体现的介质。此外,“制造物”可以包括代码在其中体现、处理和执行的硬件与软件组件的组合。当然,本领域技术人员将认识到在不背离本发明范围的情况下可以对这种配置进行许多修改,而且制造物可以包括本领域已知的任何信息承载介质。
图2、3A、3B和4的逻辑描述了以特定次序发生的特定操作。在可选实现中,特定的逻辑操作可以不同的次序执行、修改或除去。此外,操作可以添加到上述逻辑并仍然遵循所述实现。此外,在此所述的操作可以顺序发生,或者特定操作可以并行处理,或者描述为由单个处理执行的操作可以由分布式处理执行。
图2、3A、3B和4所说明的逻辑可以以软件、硬件、可编程和非可编程门阵列逻辑或者以硬件、软件或门阵列逻辑的某种组合实现。
图5说明了根据本发明的特定实现可以使用的计算机系统的体系结构。例如,客户端计算机100、服务器计算机120和/或操作员控制台180可以实现计算机体系结构500。计算机体系结构500可以实现处理器502(例如,微处理器)、存储器504(例如,易失存储器设备)及存贮器510(例如,非易失存贮器区域,如磁盘驱动器、光盘驱动器、磁带驱动器等)。操作系统505可以在存储器504中执行。存贮器510可以包括内部存贮器设备或者附属或网络访问存贮器。存贮器510中的计算机程序506可以本领域中已知的方式加载到存储器504中并由处理器502执行。该体系结构还包括启用与网络通信的网卡508。输入设备512用于向处理器502提供用户输入,可以包括键盘、鼠标、指示笔、麦克风、触摸敏感显示屏幕或本领域中已知的任何其它激活或输入机制。输出设备514能够再现来自处理器502或者如显示监视器、打印机、存贮器等的其它组件的信息。计算机系统的计算机体系结构500可以包括比所说明的少的组件、未在此说明的附加组件或者所说明的组件与附加组件的某种组合。
计算机体系结构500可以包括本领域中已知的任何计算设备,如大型机、服务器、个人计算机、工作站、膝上型电脑、手持式计算机、电话设备、网络工具、虚拟设备、存贮器控制器等。本领域中已知的任何处理器502和操作系统505都可以使用。
前面对本发明的实现的描述是为了说明和描述而提出的。它不打算是穷尽的或者要将本发明限定到所公开的精确形式。根据以上教义,许多修改与变体都是可能的。本发明的范围不打算由该具体描述限定而是由所附权利要求限定。以上说明、例子及数据提供了本发明的组成物的制造及使用的完整描述。由于在不背离本发明主旨与范围的情况下可以作出本发明的许多实现,因此本发明在于下文所附的权利要求。
Claims (19)
1、一种用于处理锚点文本的方法,包括:形成指向目标文档的锚点集合;将具有相同锚点文本的锚点分组到一起;为每个组计算信息;及基于所计算出的信息为目标文档生成上下文信息。
2、如权利要求1所述的方法,还包括:确定文档集合中每个文档的语言;确定该文档集合中每个文档的级别;及确定该文档集合中每个文档的接近类。
3、如权利要求1或权利要求2所述的方法,还包括:确定锚点集合中的主要语言;及从该集合中剪除不是主要语言的锚点。
4、如权利要求1、2或3任何一项所述的方法,还包括:从该集合中剪除包括到目标文档的路径的至少一部分的锚点。
5、如前面任何一项权利要求所述的方法,还包括:基于可配置单词集合剪除锚点。
6、如前面任何一项权利要求所述的方法,其中所述信息计算还包括:为每个组中锚点的锚点文本计算出现次数的加权总和。
7、如前面任何一项权利要求所述的方法,其中所述信息计算还包括:为每个组计算累计级别。
8、如前面任何一项权利要求所述的方法,其中所述信息计算还包括:为每个组计算语言得分。
9、如前面任何一项权利要求所述的方法,其中所述信息计算还包括:为每个组生成相关性得分。
10、一种包括用于处理锚点文本的逻辑的计算机系统,该逻辑包括:形成指向目标文档的锚点集合;将具有相同锚点文本的锚点分组到一起;为每个组计算信息;及基于所计算出的信息为目标文档生成上下文信息。
11、如权利要求10所述的计算机系统,其中该逻辑还包括:确定文档集合中每个文档的语言;确定该文档集合中每个文档的级别;及确定该文档集合中每个文档的接近类。
12、如权利要求10或权利要求11所述的计算机系统,其中该逻辑还包括:确定锚点集合中的主要语言;及从该集合中剪除不是主要语言的锚点。
13、如权利要求10至12任何一项所述的计算机系统,其中该逻辑还包括:从该集合中剪除包括到目标文档的路径的至少一部分的锚点。
14、如权利要求10至13任何一项所述的计算机系统,其中该逻辑还包括:基于可配置单词集合剪除锚点。
15、如权利要求10至14任何一项所述的计算机系统,其中用于计算信息的逻辑还包括:为每个组中锚点的锚点文本计算出现次数的加权总和。
16、如权利要求10至15任何一项所述的计算机系统,其中用于计算信息的逻辑还包括:为每个组计算累计级别。
17、如权利要求10至16任何一项所述的计算机系统,其中用于计算信息的逻辑还包括:为每个组计算语言得分。
18、如权利要求10至17任何一项所述的计算机系统,其中用于计算信息的逻辑还包括:为每个组生成相关性得分。
19、一种用于处理文档中锚点文本的计算机程序,其中该程序使操作在数据处理装置中执行,该操作包括:形成指向目标文档的锚点集合;将具有相同锚点文本的锚点分组到一起;为每个组计算信息;及基于所计算出的信息为目标文档生成上下文信息。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US10/764,801 US7499913B2 (en) | 2004-01-26 | 2004-01-26 | Method for handling anchor text |
US10/764,801 | 2004-01-26 |
Publications (1)
Publication Number | Publication Date |
---|---|
CN1906614A true CN1906614A (zh) | 2007-01-31 |
Family
ID=34795353
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CNA2005800018061A Pending CN1906614A (zh) | 2004-01-26 | 2005-01-26 | 用于处理锚点文本的方法、系统与程序 |
Country Status (5)
Country | Link |
---|---|
US (2) | US7499913B2 (zh) |
EP (1) | EP1714223A1 (zh) |
JP (1) | JP2007519111A (zh) |
CN (1) | CN1906614A (zh) |
WO (1) | WO2005071566A1 (zh) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107132963A (zh) * | 2017-05-08 | 2017-09-05 | 深圳乐信软件技术有限公司 | 红点消息显示方法、消去方法以及相应装置 |
CN111625615A (zh) * | 2019-02-27 | 2020-09-04 | 国际商业机器公司 | 文字提取与处理 |
Families Citing this family (49)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7293005B2 (en) | 2004-01-26 | 2007-11-06 | International Business Machines Corporation | Pipelined architecture for global analysis and index building |
US7424467B2 (en) * | 2004-01-26 | 2008-09-09 | International Business Machines Corporation | Architecture for an indexer with fixed width sort and variable width sort |
US8296304B2 (en) | 2004-01-26 | 2012-10-23 | International Business Machines Corporation | Method, system, and program for handling redirects in a search engine |
US7499913B2 (en) * | 2004-01-26 | 2009-03-03 | International Business Machines Corporation | Method for handling anchor text |
US7584221B2 (en) * | 2004-03-18 | 2009-09-01 | Microsoft Corporation | Field weighting in text searching |
US7260573B1 (en) * | 2004-05-17 | 2007-08-21 | Google Inc. | Personalizing anchor text scores in a search engine |
US7461064B2 (en) | 2004-09-24 | 2008-12-02 | International Buiness Machines Corporation | Method for searching documents for ranges of numeric values |
US7606793B2 (en) | 2004-09-27 | 2009-10-20 | Microsoft Corporation | System and method for scoping searches using index keys |
US8276099B2 (en) * | 2004-09-28 | 2012-09-25 | David Arthur Yost | System of GUI text cursor, caret, and selection |
US7739277B2 (en) * | 2004-09-30 | 2010-06-15 | Microsoft Corporation | System and method for incorporating anchor text into ranking search results |
US7761448B2 (en) | 2004-09-30 | 2010-07-20 | Microsoft Corporation | System and method for ranking search results using click distance |
US7827181B2 (en) | 2004-09-30 | 2010-11-02 | Microsoft Corporation | Click distance determination |
US7716198B2 (en) * | 2004-12-21 | 2010-05-11 | Microsoft Corporation | Ranking search results using feature extraction |
US7769579B2 (en) | 2005-05-31 | 2010-08-03 | Google Inc. | Learning facts from semi-structured text |
US20060161553A1 (en) * | 2005-01-19 | 2006-07-20 | Tiny Engine, Inc. | Systems and methods for providing user interaction based profiles |
US20060161587A1 (en) * | 2005-01-19 | 2006-07-20 | Tiny Engine, Inc. | Psycho-analytical system and method for audio and visual indexing, searching and retrieval |
US20060161543A1 (en) * | 2005-01-19 | 2006-07-20 | Tiny Engine, Inc. | Systems and methods for providing search results based on linguistic analysis |
US7792833B2 (en) * | 2005-03-03 | 2010-09-07 | Microsoft Corporation | Ranking search results using language types |
US9208229B2 (en) * | 2005-03-31 | 2015-12-08 | Google Inc. | Anchor text summarization for corroboration |
US8417693B2 (en) | 2005-07-14 | 2013-04-09 | International Business Machines Corporation | Enforcing native access control to indexed documents |
US7599917B2 (en) * | 2005-08-15 | 2009-10-06 | Microsoft Corporation | Ranking search results using biased click distance |
US8095565B2 (en) * | 2005-12-05 | 2012-01-10 | Microsoft Corporation | Metadata driven user interface |
US8560942B2 (en) * | 2005-12-15 | 2013-10-15 | Microsoft Corporation | Determining document layout between different views |
US20070150477A1 (en) * | 2005-12-22 | 2007-06-28 | International Business Machines Corporation | Validating a uniform resource locator ('URL') in a document |
US20070260597A1 (en) * | 2006-05-02 | 2007-11-08 | Mark Cramer | Dynamic search engine results employing user behavior |
US8442973B2 (en) * | 2006-05-02 | 2013-05-14 | Surf Canyon, Inc. | Real time implicit user modeling for personalized search |
US8117197B1 (en) | 2008-06-10 | 2012-02-14 | Surf Canyon, Inc. | Adaptive user interface for real-time search relevance feedback |
US8458207B2 (en) * | 2006-09-15 | 2013-06-04 | Microsoft Corporation | Using anchor text to provide context |
US8122026B1 (en) | 2006-10-20 | 2012-02-21 | Google Inc. | Finding and disambiguating references to entities on web pages |
US7657507B2 (en) | 2007-03-02 | 2010-02-02 | Microsoft Corporation | Pseudo-anchor text extraction for vertical search |
US8347202B1 (en) | 2007-03-14 | 2013-01-01 | Google Inc. | Determining geographic locations for place names in a fact repository |
CN101399818B (zh) * | 2007-09-25 | 2012-08-29 | 日电(中国)有限公司 | 基于导航路径信息的主题相关网页过滤方法和系统 |
US7840569B2 (en) | 2007-10-18 | 2010-11-23 | Microsoft Corporation | Enterprise relevancy ranking using a neural network |
US9348912B2 (en) | 2007-10-18 | 2016-05-24 | Microsoft Technology Licensing, Llc | Document length as a static relevance feature for ranking search results |
US8812493B2 (en) | 2008-04-11 | 2014-08-19 | Microsoft Corporation | Search results ranking using editing distance and document information |
US20100318533A1 (en) * | 2009-06-10 | 2010-12-16 | Yahoo! Inc. | Enriched document representations using aggregated anchor text |
US8380722B2 (en) * | 2010-03-29 | 2013-02-19 | Microsoft Corporation | Using anchor text with hyperlink structures for web searches |
WO2011123981A1 (en) | 2010-04-07 | 2011-10-13 | Google Inc. | Detection of boilerplate content |
US8738635B2 (en) | 2010-06-01 | 2014-05-27 | Microsoft Corporation | Detection of junk in search result ranking |
US8793706B2 (en) | 2010-12-16 | 2014-07-29 | Microsoft Corporation | Metadata-based eventing supporting operations on data |
US9779385B2 (en) | 2011-06-24 | 2017-10-03 | Facebook, Inc. | Inferring topics from social networking system communications |
US9495462B2 (en) | 2012-01-27 | 2016-11-15 | Microsoft Technology Licensing, Llc | Re-ranking search results |
US10380606B2 (en) | 2012-08-03 | 2019-08-13 | Facebook, Inc. | Negative signals for advertisement targeting |
US9558233B1 (en) | 2012-11-30 | 2017-01-31 | Google Inc. | Determining a quality measure for a resource |
US9208233B1 (en) | 2012-12-31 | 2015-12-08 | Google Inc. | Using synthetic descriptive text to rank search results |
US9208232B1 (en) | 2012-12-31 | 2015-12-08 | Google Inc. | Generating synthetic descriptive text |
US20150169701A1 (en) * | 2013-01-25 | 2015-06-18 | Google Inc. | Providing customized content in knowledge panels |
CN104965902A (zh) * | 2015-06-30 | 2015-10-07 | 北京奇虎科技有限公司 | 一种富集化url的识别方法和装置 |
CN111680152B (zh) * | 2020-06-10 | 2023-04-18 | 创新奇智(成都)科技有限公司 | 目标文本的摘要提取方法及装置、电子设备、存储介质 |
Family Cites Families (199)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6182062B1 (en) * | 1986-03-26 | 2001-01-30 | Hitachi, Ltd. | Knowledge based information retrieval system |
US4965763A (en) | 1987-03-03 | 1990-10-23 | International Business Machines Corporation | Computer method for automatic extraction of commonly specified information from business correspondence |
US5265221A (en) | 1989-03-20 | 1993-11-23 | Tandem Computers | Access restriction facility method and apparatus |
US5187790A (en) | 1989-06-29 | 1993-02-16 | Digital Equipment Corporation | Server impersonation of client processes in an object based computer operating system |
US5129152A (en) | 1990-12-20 | 1992-07-14 | Hughes Aircraft Company | Fast contact measuring machine |
JP2943447B2 (ja) | 1991-01-30 | 1999-08-30 | 三菱電機株式会社 | テキスト情報抽出装置とテキスト類似照合装置とテキスト検索システムとテキスト情報抽出方法とテキスト類似照合方法、及び、質問解析装置 |
US5287496A (en) | 1991-02-25 | 1994-02-15 | International Business Machines Corporation | Dynamic, finite versioning for concurrent transaction and query processing |
US5423032A (en) | 1991-10-31 | 1995-06-06 | International Business Machines Corporation | Method for extracting multi-word technical terms from text |
US5685003A (en) | 1992-12-23 | 1997-11-04 | Microsoft Corporation | Method and system for automatically indexing data in a document using a fresh index table |
US5873097A (en) | 1993-05-12 | 1999-02-16 | Apple Computer, Inc. | Update mechanism for computer storage container manager |
US5638543A (en) | 1993-06-03 | 1997-06-10 | Xerox Corporation | Method and apparatus for automatic document summarization |
US5544352A (en) | 1993-06-14 | 1996-08-06 | Libertech, Inc. | Method and apparatus for indexing, searching and displaying data |
JP3547098B2 (ja) | 1994-06-06 | 2004-07-28 | トヨタ自動車株式会社 | 溶射方法、溶射層を摺動面とする摺動部材の製造方法、ピストンおよびピストンの製造方法 |
US5664172A (en) | 1994-07-19 | 1997-09-02 | Oracle Corporation | Range-based query optimizer |
US5903646A (en) | 1994-09-02 | 1999-05-11 | Rackman; Michael I. | Access control system for litigation document production |
US5574906A (en) | 1994-10-24 | 1996-11-12 | International Business Machines Corporation | System and method for reducing storage requirement in backup subsystems utilizing segmented compression and differencing |
US5729730A (en) | 1995-03-28 | 1998-03-17 | Dex Information Systems, Inc. | Method and apparatus for improved information storage and retrieval system |
US6182121B1 (en) | 1995-02-03 | 2001-01-30 | Enfish, Inc. | Method and apparatus for a physical storage architecture having an improved information storage and retrieval system for a shared file environment |
US5708825A (en) | 1995-05-26 | 1998-01-13 | Iconovex Corporation | Automatic summary page creation and hyperlink generation |
US5701469A (en) | 1995-06-07 | 1997-12-23 | Microsoft Corporation | Method and system for generating accurate search results using a content-index |
US5721938A (en) | 1995-06-07 | 1998-02-24 | Stuckey; Barbara K. | Method and device for parsing and analyzing natural language sentences and text |
US5794177A (en) | 1995-07-19 | 1998-08-11 | Inso Corporation | Method and apparatus for morphological analysis and generation of natural language text |
US5721939A (en) | 1995-08-03 | 1998-02-24 | Xerox Corporation | Method and apparatus for tokenizing text |
US6026388A (en) | 1995-08-16 | 2000-02-15 | Textwise, Llc | User interface and other enhancements for natural language information retrieval system and method |
US5963940A (en) | 1995-08-16 | 1999-10-05 | Syracuse University | Natural language information retrieval system and method |
JP3441306B2 (ja) | 1995-09-12 | 2003-09-02 | 株式会社東芝 | クライアント装置、メッセージ送信方法、サーバ装置、ページ処理方法及び中継サーバ装置 |
US5745906A (en) | 1995-11-14 | 1998-04-28 | Deltatech Research, Inc. | Method and apparatus for merging delta streams to reconstruct a computer file |
US5729743A (en) | 1995-11-17 | 1998-03-17 | Deltatech Research, Inc. | Computer apparatus and method for merging system deltas |
US5745904A (en) | 1996-01-12 | 1998-04-28 | Microsoft Corporation | Buffered table user index |
US5862325A (en) | 1996-02-29 | 1999-01-19 | Intermind Corporation | Computer-based communication system and method using metadata defining a control structure |
US5778378A (en) | 1996-04-30 | 1998-07-07 | International Business Machines Corporation | Object oriented information retrieval framework mechanism |
JP3108015B2 (ja) * | 1996-05-22 | 2000-11-13 | 松下電器産業株式会社 | ハイパーテキスト検索装置 |
JP3061765B2 (ja) | 1996-05-23 | 2000-07-10 | ゼロックス コーポレイション | コンピュータベースの文書処理方法 |
US5920859A (en) * | 1997-02-05 | 1999-07-06 | Idd Enterprises, L.P. | Hypertext document retrieval system and method |
WO1997049048A1 (en) * | 1996-06-17 | 1997-12-24 | Idd Enterprises, L.P. | Hypertext document retrieval system and method |
US5909677A (en) | 1996-06-18 | 1999-06-01 | Digital Equipment Corporation | Method for determining the resemblance of documents |
US5832480A (en) * | 1996-07-12 | 1998-11-03 | International Business Machines Corporation | Using canonical forms to develop a dictionary of names in a text |
US5995980A (en) | 1996-07-23 | 1999-11-30 | Olson; Jack E. | System and method for database update replication |
US5832500A (en) | 1996-08-09 | 1998-11-03 | Digital Equipment Corporation | Method for searching an index |
US5745900A (en) | 1996-08-09 | 1998-04-28 | Digital Equipment Corporation | Method for indexing duplicate database records using a full-record fingerprint |
US5852820A (en) | 1996-08-09 | 1998-12-22 | Digital Equipment Corporation | Method for optimizing entries for searching an index |
US5745898A (en) | 1996-08-09 | 1998-04-28 | Digital Equipment Corporation | Method for generating a compressed index of information of records of a database |
US5765168A (en) | 1996-08-09 | 1998-06-09 | Digital Equipment Corporation | Method for maintaining an index |
US5797008A (en) | 1996-08-09 | 1998-08-18 | Digital Equipment Corporation | Memory storing an integrated index of database records |
US5765149A (en) | 1996-08-09 | 1998-06-09 | Digital Equipment Corporation | Modified collection frequency ranking method |
US5745889A (en) | 1996-08-09 | 1998-04-28 | Digital Equipment Corporation | Method for parsing information of databases records using word-location pairs and metaword-location pairs |
US5787435A (en) | 1996-08-09 | 1998-07-28 | Digital Equipment Corporation | Method for mapping an index of a database into an array of files |
US5864863A (en) | 1996-08-09 | 1999-01-26 | Digital Equipment Corporation | Method for parsing, indexing and searching world-wide-web pages |
US5745894A (en) | 1996-08-09 | 1998-04-28 | Digital Equipment Corporation | Method for generating and searching a range-based index of word-locations |
US5765158A (en) | 1996-08-09 | 1998-06-09 | Digital Equipment Corporation | Method for sampling a compressed index to create a summarized index |
US5809502A (en) | 1996-08-09 | 1998-09-15 | Digital Equipment Corporation | Object-oriented interface for an index |
US5745890A (en) | 1996-08-09 | 1998-04-28 | Digital Equipment Corporation | Sequential searching of a database index using constraints on word-location pairs |
US5765150A (en) | 1996-08-09 | 1998-06-09 | Digital Equipment Corporation | Method for statistically projecting the ranking of information |
US5745899A (en) | 1996-08-09 | 1998-04-28 | Digital Equipment Corporation | Method for indexing information of a database |
US5724033A (en) | 1996-08-09 | 1998-03-03 | Digital Equipment Corporation | Method for encoding delta values |
JP2001505330A (ja) | 1996-08-22 | 2001-04-17 | ルノー・アンド・オスピー・スピーチ・プロダクツ・ナームローゼ・ベンノートシャープ | テキストストリーム中の単語の切れ目を与える方法及び装置 |
US5924091A (en) | 1996-08-28 | 1999-07-13 | Sybase, Inc. | Database system with improved methods for radix sorting |
US6078914A (en) * | 1996-12-09 | 2000-06-20 | Open Text Corporation | Natural language meta-search system and method |
US6285999B1 (en) | 1997-01-10 | 2001-09-04 | The Board Of Trustees Of The Leland Stanford Junior University | Method for node ranking in a linked database |
JP3579204B2 (ja) | 1997-01-17 | 2004-10-20 | 富士通株式会社 | 文書要約装置およびその方法 |
US5903891A (en) | 1997-02-25 | 1999-05-11 | Hewlett-Packard Company | Hierarchial information processes that share intermediate data and formulate contract data |
US6278992B1 (en) | 1997-03-19 | 2001-08-21 | John Andrew Curtis | Search engine using indexing method for storing and retrieving data |
JP4243344B2 (ja) * | 1997-05-23 | 2009-03-25 | 株式会社Access | 移動通信機器 |
US5884305A (en) | 1997-06-13 | 1999-03-16 | International Business Machines Corporation | System and method for data mining from relational data by sieving through iterated relational reinforcement |
EP0884688A3 (en) | 1997-06-16 | 2005-06-22 | Koninklijke Philips Electronics N.V. | Sparse index search method |
US5933822A (en) | 1997-07-22 | 1999-08-03 | Microsoft Corporation | Apparatus and methods for an information retrieval system that employs natural language processing of search results to improve overall precision |
US6078916A (en) * | 1997-08-01 | 2000-06-20 | Culliss; Gary | Method for organizing information |
US6026413A (en) | 1997-08-01 | 2000-02-15 | International Business Machines Corporation | Determining how changes to underlying data affect cached objects |
US7031954B1 (en) | 1997-09-10 | 2006-04-18 | Google, Inc. | Document retrieval system with access control |
US5974412A (en) | 1997-09-24 | 1999-10-26 | Sapient Health Network | Intelligent query system for automatically indexing information in a database and automatically categorizing users |
US6594682B2 (en) | 1997-10-28 | 2003-07-15 | Microsoft Corporation | Client-side system for scheduling delivery of web content and locally managing the web content |
US6061678A (en) | 1997-10-31 | 2000-05-09 | Oracle Corporation | Approach for managing access to large objects in database systems using large object indexes |
US6029165A (en) | 1997-11-12 | 2000-02-22 | Arthur Andersen Llp | Search and retrieval information system and method |
KR100285265B1 (ko) | 1998-02-25 | 2001-04-02 | 윤덕용 | 데이터 베이스 관리 시스템과 정보 검색의 밀결합을 위하여 서브 인덱스와 대용량 객체를 이용한 역 인덱스 저장 구조 |
US6005503A (en) | 1998-02-27 | 1999-12-21 | Digital Equipment Corporation | Method for encoding and decoding a list of variable size integers to reduce branch mispredicts |
US6016501A (en) | 1998-03-18 | 2000-01-18 | Bmc Software | Enterprise data movement system and method which performs data load and changed data propagation operations |
US6119124A (en) | 1998-03-26 | 2000-09-12 | Digital Equipment Corporation | Method for clustering closely resembling data objects |
US6088694A (en) | 1998-03-31 | 2000-07-11 | International Business Machines Corporation | Continuous availability and efficient backup for externally referenced objects |
US6374268B1 (en) | 1998-04-14 | 2002-04-16 | Hewlett-Packard Company | Methods and systems for an incremental file system |
US6192333B1 (en) | 1998-05-12 | 2001-02-20 | Microsoft Corporation | System for creating a dictionary |
US6212522B1 (en) | 1998-05-15 | 2001-04-03 | International Business Machines Corporation | Searching and conditionally serving bookmark sets based on keywords |
US6205451B1 (en) | 1998-05-22 | 2001-03-20 | Oracle Corporation | Method and apparatus for incremental refresh of summary tables in a database system |
AU4196299A (en) | 1998-05-23 | 1999-12-13 | Eolas Technologies, Incorporated | Identification of features of multi-dimensional image data in hypermedia systems |
US6216175B1 (en) | 1998-06-08 | 2001-04-10 | Microsoft Corporation | Method for upgrading copies of an original file with same update data after normalizing differences between copies created during respective original installations |
US7024623B2 (en) | 1998-06-17 | 2006-04-04 | Microsoft Corporation | Method and system for placing an insertion point in an electronic document |
EP0981099A3 (en) | 1998-08-17 | 2004-04-21 | Connected Place Limited | A method of and an apparatus for merging a sequence of delta files |
US6243713B1 (en) | 1998-08-24 | 2001-06-05 | Excalibur Technologies Corp. | Multimedia document retrieval by application of multimedia queries to a unified index of multimedia data for a plurality of multimedia data types |
US6334131B2 (en) * | 1998-08-29 | 2001-12-25 | International Business Machines Corporation | Method for cataloging, filtering, and relevance ranking frame-based hierarchical information structures |
GB9818819D0 (en) | 1998-08-29 | 1998-10-21 | Int Computers Ltd | Time-versioned data storage mechanism |
US6308179B1 (en) | 1998-08-31 | 2001-10-23 | Xerox Corporation | User level controlled mechanism inter-positioned in a read/write path of a property-based document management system |
US6553385B2 (en) | 1998-09-01 | 2003-04-22 | International Business Machines Corporation | Architecture of a framework for information extraction from natural language documents |
US6519597B1 (en) | 1998-10-08 | 2003-02-11 | International Business Machines Corporation | Method and apparatus for indexing structured documents with rich data types |
US6336122B1 (en) * | 1998-10-15 | 2002-01-01 | International Business Machines Corporation | Object oriented class archive file maker and method |
US6519593B1 (en) | 1998-12-15 | 2003-02-11 | Yossi Matias | Efficient bundle sorting |
CA2256934C (en) | 1998-12-23 | 2002-04-02 | Hamid Bacha | System for electronic repository of data enforcing access control on data retrieval |
US6295529B1 (en) | 1998-12-24 | 2001-09-25 | Microsoft Corporation | Method and apparatus for indentifying clauses having predetermined characteristics indicative of usefulness in determining relationships between different texts |
US6381602B1 (en) | 1999-01-26 | 2002-04-30 | Microsoft Corporation | Enforcing access control on resources at a location other than the source location |
US6418433B1 (en) | 1999-01-28 | 2002-07-09 | International Business Machines Corporation | System and method for focussed web crawling |
US6584458B1 (en) | 1999-02-19 | 2003-06-24 | Novell, Inc. | Method and apparatuses for creating a full text index accommodating child words |
US6438535B1 (en) | 1999-03-18 | 2002-08-20 | Lockheed Martin Corporation | Relational database method for accessing information useful for the manufacture of, to interconnect nodes in, to repair and to maintain product and system units |
US6631496B1 (en) * | 1999-03-22 | 2003-10-07 | Nec Corporation | System for personalizing, organizing and managing web information |
US6393415B1 (en) | 1999-03-31 | 2002-05-21 | Verizon Laboratories Inc. | Adaptive partitioning techniques in performing query requests and request routing |
US6336117B1 (en) | 1999-04-30 | 2002-01-01 | International Business Machines Corporation | Content-indexing search system and method providing search results consistent with content filtering and blocking policies implemented in a blocking engine |
US6269361B1 (en) | 1999-05-28 | 2001-07-31 | Goto.Com | System and method for influencing a position on a search result list generated by a computer network search engine |
JP2000339309A (ja) | 1999-05-31 | 2000-12-08 | Sony Corp | 文字列解析装置、文字列解析方法及び提供媒体 |
US7472349B1 (en) | 1999-06-01 | 2008-12-30 | Oracle International Corporation | Dynamic services infrastructure for allowing programmatic access to internet and other resources |
US6421655B1 (en) | 1999-06-04 | 2002-07-16 | Microsoft Corporation | Computer-based representations and reasoning methods for engaging users in goal-oriented conversations |
US6631369B1 (en) * | 1999-06-30 | 2003-10-07 | Microsoft Corporation | Method and system for incremental web crawling |
US6547829B1 (en) | 1999-06-30 | 2003-04-15 | Microsoft Corporation | Method and system for detecting duplicate documents in web crawls |
US6339772B1 (en) | 1999-07-06 | 2002-01-15 | Compaq Computer Corporation | System and method for performing database operations on a continuous stream of tuples |
US6463439B1 (en) | 1999-07-15 | 2002-10-08 | American Management Systems, Incorporated | System for accessing database tables mapped into memory for high performance data retrieval |
US7065784B2 (en) | 1999-07-26 | 2006-06-20 | Microsoft Corporation | Systems and methods for integrating access control with a namespace |
US6587458B1 (en) * | 1999-08-04 | 2003-07-01 | At&T Corporation | Method and apparatus for an internet Caller-ID delivery plus service |
US6754873B1 (en) * | 1999-09-20 | 2004-06-22 | Google Inc. | Techniques for finding related hyperlinked documents using link-based analysis |
US8914361B2 (en) * | 1999-09-22 | 2014-12-16 | Google Inc. | Methods and systems for determining a meaning of a document to match the document to content |
US6665666B1 (en) * | 1999-10-26 | 2003-12-16 | International Business Machines Corporation | System, method and program product for answering questions using a search engine |
JP2001134575A (ja) | 1999-10-29 | 2001-05-18 | Internatl Business Mach Corp <Ibm> | 頻出パターン検出方法およびシステム |
US6507846B1 (en) | 1999-11-09 | 2003-01-14 | Joint Technology Corporation | Indexing databases for efficient relational querying |
US6665657B1 (en) * | 1999-11-19 | 2003-12-16 | Niku Corporation | Method and system for cross browsing of various multimedia data sources in a searchable repository |
US6839702B1 (en) | 1999-12-15 | 2005-01-04 | Google Inc. | Systems and methods for highlighting search results |
US6725214B2 (en) | 2000-01-14 | 2004-04-20 | Dotnsf | Apparatus and method to support management of uniform resource locators and/or contents of database servers |
US6678409B1 (en) * | 2000-01-14 | 2004-01-13 | Microsoft Corporation | Parameterized word segmentation of unsegmented text |
US6615209B1 (en) | 2000-02-22 | 2003-09-02 | Google, Inc. | Detecting query-specific duplicate documents |
US20020032677A1 (en) | 2000-03-01 | 2002-03-14 | Jeff Morgenthaler | Methods for creating, editing, and updating searchable graphical database and databases of graphical images and information and displaying graphical images from a searchable graphical database or databases in a sequential or slide show format |
US6985948B2 (en) * | 2000-03-29 | 2006-01-10 | Fujitsu Limited | User's right information and keywords input based search query generating means method and apparatus for searching a file |
US6658406B1 (en) * | 2000-03-29 | 2003-12-02 | Microsoft Corporation | Method for selecting terms from vocabularies in a category-based system |
FR2807537B1 (fr) | 2000-04-06 | 2003-10-17 | France Telecom | Moteur de recherche de ressources hypermedia et procede d'indexation associe |
US7173912B2 (en) | 2000-05-05 | 2007-02-06 | Fujitsu Limited | Method and system for modeling and advertising asymmetric topology of a node in a transport network |
US6850979B1 (en) | 2000-05-09 | 2005-02-01 | Sun Microsystems, Inc. | Message gates in a distributed computing environment |
US6643650B1 (en) * | 2000-05-09 | 2003-11-04 | Sun Microsystems, Inc. | Mechanism and apparatus for using messages to look up documents stored in spaces in a distributed computing environment |
US6868447B1 (en) | 2000-05-09 | 2005-03-15 | Sun Microsystems, Inc. | Mechanism and apparatus for returning results of services in a distributed computing environment |
US6789077B1 (en) | 2000-05-09 | 2004-09-07 | Sun Microsystems, Inc. | Mechanism and apparatus for web-based searching of URI-addressable repositories in a distributed computing environment |
SE517005C2 (sv) | 2000-05-31 | 2002-04-02 | Hapax Information Systems Ab | Segmentering av text |
US20010049671A1 (en) | 2000-06-05 | 2001-12-06 | Joerg Werner B. | e-Stract: a process for knowledge-based retrieval of electronic information |
SE517496C2 (sv) | 2000-06-22 | 2002-06-11 | Hapax Information Systems Ab | Metod och system för informationsextrahering |
US6839665B1 (en) | 2000-06-27 | 2005-01-04 | Text Analysis International, Inc. | Automated generation of text analysis systems |
US6567804B1 (en) | 2000-06-27 | 2003-05-20 | Ncr Corporation | Shared computation of user-defined metrics in an on-line analytic processing system |
US6578032B1 (en) | 2000-06-28 | 2003-06-10 | Microsoft Corporation | Method and system for performing phrase/word clustering and cluster merging |
US6865575B1 (en) | 2000-07-06 | 2005-03-08 | Google, Inc. | Methods and apparatus for using a modified index to provide search results in response to an ambiguous search query |
US20030217052A1 (en) | 2000-08-24 | 2003-11-20 | Celebros Ltd. | Search engine method and apparatus |
US6701317B1 (en) | 2000-09-19 | 2004-03-02 | Overture Services, Inc. | Web page connectivity server construction |
JP4649731B2 (ja) * | 2000-11-27 | 2011-03-16 | 日本電気株式会社 | 文書要約システム及び文書要約方法 |
US6633872B2 (en) * | 2000-12-18 | 2003-10-14 | International Business Machines Corporation | Extendible access control for lightweight directory access protocol |
US20030028564A1 (en) | 2000-12-19 | 2003-02-06 | Lingomotors, Inc. | Natural language method and system for matching and ranking documents in terms of semantic relatedness |
US6907423B2 (en) * | 2001-01-04 | 2005-06-14 | Sun Microsystems, Inc. | Search engine interface and method of controlling client searches |
US7356530B2 (en) | 2001-01-10 | 2008-04-08 | Looksmart, Ltd. | Systems and methods of retrieving relevant information |
US6766316B2 (en) | 2001-01-18 | 2004-07-20 | Science Applications International Corporation | Method and system of ranking and clustering for document indexing and retrieval |
US6658423B1 (en) * | 2001-01-24 | 2003-12-02 | Google, Inc. | Detecting duplicate and near-duplicate files |
US20020165707A1 (en) | 2001-02-26 | 2002-11-07 | Call Charles G. | Methods and apparatus for storing and processing natural language text data as a sequence of fixed length integers |
SE520533C2 (sv) | 2001-03-13 | 2003-07-22 | Picsearch Ab | Metod, datorprogram och system för indexering av digitaliserade enheter |
US6904454B2 (en) * | 2001-03-21 | 2005-06-07 | Nokia Corporation | Method and apparatus for content repository with versioning and data modeling |
US7509492B2 (en) | 2001-03-27 | 2009-03-24 | Microsoft Corporation | Distributed scalable cryptographic access control |
US6990634B2 (en) | 2001-04-27 | 2006-01-24 | The United States Of America As Represented By The National Security Agency | Method of summarizing text by sentence extraction |
US20020169770A1 (en) * | 2001-04-27 | 2002-11-14 | Kim Brian Seong-Gon | Apparatus and method that categorize a collection of documents into a hierarchy of categories that are defined by the collection of documents |
US6999971B2 (en) | 2001-05-08 | 2006-02-14 | Verity, Inc. | Apparatus and method for parametric group processing |
US7299219B2 (en) | 2001-05-08 | 2007-11-20 | The Johns Hopkins University | High refresh-rate retrieval of freshly published content using distributed crawling |
US20030046311A1 (en) | 2001-06-19 | 2003-03-06 | Ryan Baidya | Dynamic search engine and database |
US6622211B2 (en) * | 2001-08-15 | 2003-09-16 | Ip-First, L.L.C. | Virtual set cache that redirects store data to correct virtual set to avoid virtual set store miss penalty |
JP3557605B2 (ja) | 2001-09-19 | 2004-08-25 | インターナショナル・ビジネス・マシーンズ・コーポレーション | 文切り方法及びこれを用いた文切り処理装置、機械翻訳装置並びにプログラム |
US6877136B2 (en) | 2001-10-26 | 2005-04-05 | United Services Automobile Association (Usaa) | System and method of providing electronic access to one or more documents |
US6763362B2 (en) * | 2001-11-30 | 2004-07-13 | Micron Technology, Inc. | Method and system for updating a search engine |
US7249034B2 (en) | 2002-01-14 | 2007-07-24 | International Business Machines Corporation | System and method for publishing a person's affinities |
US6829606B2 (en) * | 2002-02-14 | 2004-12-07 | Infoglide Software Corporation | Similarity search engine for use with relational databases |
US7949648B2 (en) | 2002-02-26 | 2011-05-24 | Soren Alain Mortensen | Compiling and accessing subject-specific information from a computer network |
US7243301B2 (en) | 2002-04-10 | 2007-07-10 | Microsoft Corporation | Common annotation framework |
US20030225763A1 (en) | 2002-04-15 | 2003-12-04 | Microsoft Corporation | Self-improving system and method for classifying pages on the world wide web |
US7080091B2 (en) | 2002-05-09 | 2006-07-18 | Oracle International Corporation | Inverted index system and method for numeric attributes |
US7096208B2 (en) | 2002-06-10 | 2006-08-22 | Microsoft Corporation | Large margin perceptrons for document categorization |
US20040128615A1 (en) | 2002-12-27 | 2004-07-01 | International Business Machines Corporation | Indexing and querying semi-structured documents |
US7051023B2 (en) * | 2003-04-04 | 2006-05-23 | Yahoo! Inc. | Systems and methods for generating concept units from search queries |
US7197497B2 (en) * | 2003-04-25 | 2007-03-27 | Overture Services, Inc. | Method and apparatus for machine learning a document relevance function |
US7516146B2 (en) | 2003-05-15 | 2009-04-07 | Microsoft Corporation | Fast adaptive document filtering |
US7139752B2 (en) | 2003-05-30 | 2006-11-21 | International Business Machines Corporation | System, method and computer program product for performing unstructured information management and automatic text analysis, and providing multiple document views derived from different document tokenizations |
US20040243560A1 (en) | 2003-05-30 | 2004-12-02 | International Business Machines Corporation | System, method and computer program product for performing unstructured information management and automatic text analysis, including an annotation inverted file system facilitating indexing and searching |
US7146361B2 (en) | 2003-05-30 | 2006-12-05 | International Business Machines Corporation | System, method and computer program product for performing unstructured information management and automatic text analysis, including a search operator functioning as a Weighted AND (WAND) |
US20040243556A1 (en) | 2003-05-30 | 2004-12-02 | International Business Machines Corporation | System, method and computer program product for performing unstructured information management and automatic text analysis, and including a document common analysis system (CAS) |
US20040243554A1 (en) | 2003-05-30 | 2004-12-02 | International Business Machines Corporation | System, method and computer program product for performing unstructured information management and automatic text analysis |
US7188254B2 (en) | 2003-08-20 | 2007-03-06 | Microsoft Corporation | Peer-to-peer authorization method |
US6934634B1 (en) | 2003-09-22 | 2005-08-23 | Google Inc. | Address geocoding |
US6906920B1 (en) | 2003-09-29 | 2005-06-14 | Google Inc. | Drive cooling baffle |
US6870095B1 (en) | 2003-09-29 | 2005-03-22 | Google Inc. | Cable management for rack mounted computing system |
US6845009B1 (en) | 2003-09-30 | 2005-01-18 | Google Inc. | Cooling baffle and fan mount apparatus |
US7849063B2 (en) | 2003-10-17 | 2010-12-07 | Yahoo! Inc. | Systems and methods for indexing content for fast and scalable retrieval |
US7620624B2 (en) | 2003-10-17 | 2009-11-17 | Yahoo! Inc. | Systems and methods for indexing content for fast and scalable retrieval |
US20050144241A1 (en) | 2003-10-17 | 2005-06-30 | Stata Raymond P. | Systems and methods for a search-based email client |
US7693824B1 (en) | 2003-10-20 | 2010-04-06 | Google Inc. | Number-range search system and method |
US20050149499A1 (en) | 2003-12-30 | 2005-07-07 | Google Inc., A Delaware Corporation | Systems and methods for improving search quality |
US8150824B2 (en) | 2003-12-31 | 2012-04-03 | Google Inc. | Systems and methods for direct navigation to specific portion of target document |
US20050149851A1 (en) | 2003-12-31 | 2005-07-07 | Google Inc. | Generating hyperlinks and anchor text in HTML and non-HTML documents |
US7424467B2 (en) | 2004-01-26 | 2008-09-09 | International Business Machines Corporation | Architecture for an indexer with fixed width sort and variable width sort |
US7499913B2 (en) | 2004-01-26 | 2009-03-03 | International Business Machines Corporation | Method for handling anchor text |
US8296304B2 (en) | 2004-01-26 | 2012-10-23 | International Business Machines Corporation | Method, system, and program for handling redirects in a search engine |
US7293005B2 (en) | 2004-01-26 | 2007-11-06 | International Business Machines Corporation | Pipelined architecture for global analysis and index building |
US7318075B2 (en) | 2004-02-06 | 2008-01-08 | Microsoft Corporation | Enhanced tabular data stream protocol |
US8688143B2 (en) | 2004-08-24 | 2014-04-01 | Qualcomm Incorporated | Location based service (LBS) system and method for creating a social network |
US7461064B2 (en) | 2004-09-24 | 2008-12-02 | International Buiness Machines Corporation | Method for searching documents for ranges of numeric values |
US20060129538A1 (en) | 2004-12-14 | 2006-06-15 | Andrea Baader | Text search quality by exploiting organizational information |
US8417693B2 (en) | 2005-07-14 | 2013-04-09 | International Business Machines Corporation | Enforcing native access control to indexed documents |
US7840542B2 (en) | 2006-02-06 | 2010-11-23 | International Business Machines Corporation | Method and system for controlling access to semantic web statements |
-
2004
- 2004-01-26 US US10/764,801 patent/US7499913B2/en active Active
-
2005
- 2005-01-26 WO PCT/EP2005/050321 patent/WO2005071566A1/en not_active Application Discontinuation
- 2005-01-26 CN CNA2005800018061A patent/CN1906614A/zh active Pending
- 2005-01-26 JP JP2006550184A patent/JP2007519111A/ja active Pending
- 2005-01-26 EP EP05701609A patent/EP1714223A1/en not_active Withdrawn
-
2008
- 2008-12-03 US US12/327,777 patent/US8285724B2/en active Active
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107132963A (zh) * | 2017-05-08 | 2017-09-05 | 深圳乐信软件技术有限公司 | 红点消息显示方法、消去方法以及相应装置 |
CN107132963B (zh) * | 2017-05-08 | 2020-09-08 | 深圳乐信软件技术有限公司 | 红点消息显示方法、消去方法以及相应装置 |
CN111625615A (zh) * | 2019-02-27 | 2020-09-04 | 国际商业机器公司 | 文字提取与处理 |
CN111625615B (zh) * | 2019-02-27 | 2023-08-01 | 国际商业机器公司 | 用于处理文本数据的方法和系统 |
Also Published As
Publication number | Publication date |
---|---|
JP2007519111A (ja) | 2007-07-12 |
US8285724B2 (en) | 2012-10-09 |
US20050165781A1 (en) | 2005-07-28 |
US7499913B2 (en) | 2009-03-03 |
WO2005071566A1 (en) | 2005-08-04 |
EP1714223A1 (en) | 2006-10-25 |
US20090083270A1 (en) | 2009-03-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN1906614A (zh) | 用于处理锚点文本的方法、系统与程序 | |
AU2010343183B2 (en) | Search suggestion clustering and presentation | |
JP5492187B2 (ja) | 編集距離および文書情報を使用する検索結果順位付け | |
Adar et al. | The web changes everything: understanding the dynamics of web content | |
US7783626B2 (en) | Pipelined architecture for global analysis and index building | |
Guerbas et al. | Effective web log mining and online navigational pattern prediction | |
US9092510B1 (en) | Modifying search result ranking based on a temporal element of user feedback | |
US8898150B1 (en) | Collecting image search event information | |
US20170068740A1 (en) | Method and system for web searching | |
US20070100797A1 (en) | Indication of exclusive items in a result set | |
US20090287676A1 (en) | Search results with word or phrase index | |
KR20070098521A (ko) | 웹 크롤링 프로세스 동안 웹 사이트에 우선순위를 부여하기위한 시스템 및 방법 | |
US20090303238A1 (en) | Identifying on a graphical depiction candidate points and top-moving queries | |
WO2012051470A1 (en) | Systems and methods for using a behavior history of a user to augment content of a webpage | |
KR20110009198A (ko) | 최다 클릭된 다음 객체들을 갖는 검색 결과 | |
US20100179953A1 (en) | Information presentation system, information presentation method, and program for information presentation | |
EP1993045A1 (en) | Electronic document retrievel system | |
EP1975816A1 (en) | Electronic document retrieval system | |
US20050165800A1 (en) | Method, system, and program for handling redirects in a search engine | |
US20020152242A1 (en) | System for monitoring the usage of intranet portal modules | |
Sharma et al. | Web search result optimization by mining the search engine query logs | |
KR100667917B1 (ko) | 웹사이트 검색 서비스 제공 방법 및 그 시스템 | |
JP2003271648A (ja) | 検索装置、検索方法、ならびに、プログラム | |
KR100942902B1 (ko) | 웹페이지 검색 방법 및 상기 방법을 컴퓨터에서 구현하는 프로그램을 기록한 컴퓨터 판독 가능한 기록 매체 | |
Pandian et al. | A Unified Model for Preprocessing and Clustering Technique for Web Usage Mining. |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C02 | Deemed withdrawal of patent application after publication (patent law 2001) | ||
WD01 | Invention patent application deemed withdrawn after publication |