CN101253498A - 从半结构化的文本学习事实 - Google Patents

从半结构化的文本学习事实 Download PDF

Info

Publication number
CN101253498A
CN101253498A CNA2006800280576A CN200680028057A CN101253498A CN 101253498 A CN101253498 A CN 101253498A CN A2006800280576 A CNA2006800280576 A CN A2006800280576A CN 200680028057 A CN200680028057 A CN 200680028057A CN 101253498 A CN101253498 A CN 101253498A
Authority
CN
China
Prior art keywords
document
property value
value
context pattern
seed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CNA2006800280576A
Other languages
English (en)
Other versions
CN101253498B (zh
Inventor
赵树彬
乔纳森·T·贝茨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Google LLC
Original Assignee
Google LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Google LLC filed Critical Google LLC
Publication of CN101253498A publication Critical patent/CN101253498A/zh
Application granted granted Critical
Publication of CN101253498B publication Critical patent/CN101253498B/zh
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/93Document management systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/353Clustering; Classification into predefined classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/80Information retrieval; Database structures therefor; File system structures therefor of semi-structured data, e.g. markup language structured data such as SGML, XML or HTML
    • G06F16/84Mapping; Conversion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S707/00Data processing: database and file management or data structures
    • Y10S707/953Organization of data
    • Y10S707/962Entity-attribute-value

Abstract

本发明描述了一种用于从半结构化的文本学习或者引导事实的方法和系统。以与对象相关联的种子事实集开始,识别与所述对象相关联的文档。查看所识别的文档以确定是否每个至少具有第一预定数量的种子事实。如果文档至少具有第一预定数量的种子事实,则识别与所述种子事实相关联的上下文模式,并且识别与所述上下文模式匹配的、在文档中的其它内容实例。如果所述文档包括与所述上下文模式匹配的、至少第二预定数量的其它内容实例,则可以从所述其它实例提取事实。

Description

从半结构化的文本学习事实
本申请涉及下面的申请,其中每个通过引用被包含在此:
美国专利申请第11/097,688号,“确证从多个来源提取的事实”,2005年3月31日提交;
美国专利申请第11/097,690号,“从一组可能的回答中选择对于事实查询的最佳回答”,2005年3月31日提交;
美国专利申请第11/097,689号,“具有来自包括查询项目和回答项目的信息源的片断的事实查询引擎的用户界面”,2005年3月31日提交;
美国专利申请第11/142,740号,“合并事实数据库中的对象”,2005年5月31日提交;
美国专利申请第11/142,748号,“用于保证事实库的内部一致性的系统”,2005年5月31日提交;
美国专利申请第11/142,765号,“识别一组事实的一致的主题”,2005年5月31日提交;
技术领域
所公开的实施例总体上涉及事实数据库。具体地,所公开的实施例涉及从包括在半结构化的文本中所提供的事实信息的文档学习事实。
背景技术
万维网(也被称为“Web”)和在万维网中的网页是事实信息的巨大来源。用户可以查看网页来获得对于事实问题的回答,所述事实问题诸如“波兰的首都是哪里”或者“乔治华盛顿的生日是哪天”。在网页中包括的事实信息可以被提取并存储在事实数据库中。
可以通过自动化的过程来进行从网页提取事实信息。然而,此种自动化过程并不完美。它们可能遗漏某些事实信息和/或将非事实信息误识别为事实信息并将其提取。而且,所述过程可能提取错误的事实信息,因为在网页中的信息在一开始就是错误的或者所述自动化过程误解释了网页中的信息。所遗漏的事实信息减少了事实数据库的覆盖范围,并且错误的事实降低了事实数据库的质量。
发明内容
按照本发明的一个方面,一种学习事实的方法包括:访问具有名称和一个或多个种子属性值对(seed attribute-value pair)的对象;识别与所述对象名称相关联的一组文档,在所述组中的每个文档具有所述对象的至少第一预定数量的种子属性值对;对于在所识别的组中的每个文档:在所述文档中识别与在所述文档中的种子属性值对相关联的上下文模式;确认所述文档包括至少第二预定数量的与所述上下文模式匹配的附加内容实例;并且,当所述确认成功时,从匹配所述上下文模式的相应内容实例提取属性值对,并且将所提取的属性值对合并到所述对象中。
附图说明
图1图解了按照本发明的一些实施例的网络。
图2是图解按照本发明的一些实施例的用于学习事实的过程的流程图。
图3图解了按照本发明的一些实施例的在事实库中的对象和相关联的事实的数据结构。
图4图解了按照本发明的一些实施例的文档处理系统。
在全部附图中,相同的附图标记表示对应的部分。
具体实施方式
可以通过引导过程来验证在事实库中的事实,并且发现和提取附加事实。以与对象相关联的一个或多个种子事实开始,识别与所述对象相关联并且包括至少预定数量的种子事实的文档。识别围绕这些文档中的种子事实的上下文模式。使用所述上下文模式,找到文档中的具有相同的上下文模式的其它内容。从具有同一上下文模式的其它内容识别事实。所识别的事实可以被加到事实库,或者用于验证已经在事实库中的事实。换句话说,通过引导来学习的过程使用已经在事实库中的事实来验证事实,并且找到附加事实加到事实库中。
图1图解了按照本发明的一些实施例的网络100。网络100包括一个或多个文档主机102和事实库引擎106。网络100也包括耦接这些部件的一个或多个网络104。
文档主机102存储文档,并且提供对于文档的访问。文档可以是任何机器可读的数据,其包括文本、图形、多媒体内容等的任何组合。在一些实施例中,文档可以是以超文本标记语言(HTML)所写的文本、图形和可能的其它形式的信息的组合,即网页。文档可以包括一个或多个到其它文档的超链接。文档可以包括在其内容中的一个或多个事实。统一资源定位符(URL)、或者网址或者任何其它适当形式的识别和/或定位可以定位和/或识别存储在文档主机102中的文档。每个文档也可以与页面重要性量度相关联。文档的页面重要性量度测定所述文档相对于其它文档的重要性、普及性或者声誉。在一些实施例中,所述页面重要性量度是文档的PageRank。对于关于PageRank量度及其计算的更多信息,参见例如Page et al.,“The PageRank citation ranking:Bringing order to the web,”Stanford Digital Libraries Working Paper,1998(佩奇等人,“PageRank引用排名:在网络中引入秩序”,斯坦福电子图书馆工作文件,1998);Haveliwala,“Topic-sensitive PageRank,”11th International World Wide Web Conference,Honolulu,Hawaii,May7-11,2002(哈维里瓦拉,“对主题敏感的PageRank”,第11届国际互联网大会,夏威夷火奴鲁鲁,2002年5月7日至11日);Richardson andDomingos,“The Intelligent Surfer:Probabilistic Combination of Link andContent Information in PageRank,”Vol.14,MiT Press,Cambridge,MA,2002(理查得逊和道明高斯,“聪明的冲浪者:在PageRank中链接和内容信息的概率性结合”,第14卷,MIT出版社,马萨诸塞州剑桥,2002);Jeh and Widom,“Scaling personalized web search,”12th InternationalWorld Wide Web Conference,Budapest,Hungary,May 20-24,2002(杰和威道姆,“按比例伸缩个性化网络搜索”,第12届国际互联网大会,匈牙利布达佩斯,2002年5月20日至24日);Brin and Page,“The Anatomyof a Large-Scale Hypertextual Search Engine,”7th International WorldWide Web Conference,Brisbane,Australia,April 14-18,1998(布林和佩奇,“对大规模超文本搜索引擎的剖析”,第7届国际互联网大会,澳大利亚布里斯本,1998年4月14日至18日);以及美国专利第6,285,999号,其中每个通过引用而被整体并入在此来作为背景信息。
事实库引擎106包括导入器108、库管理器110、事实索引112和事实库114。导入器108从在文档主机102上存储的文档提取事实信息。导入器108分析在文档主机102中存储的文档的内容,确定所述内容是否包括事实信息并确定与所述事实信息相关联的一个或多个主题,并且提取在所述内容中的任何可用事实信息。
库管理器110处理由导入器108提取的事实。库管理器110建立和管理事实库114和事实索引112。库管理器110接收由导入器108提取的事实,并且将它们存储在事实库114中。库管理器110也可以对事实库114中的事实执行操作,以“清理”在事实库114中的数据。例如,库管理器110可以查找事实库114以找出重复的事实(即传达完全相同的事实信息的事实),并且将其合并。库管理器110也可以将事实规范为标准格式。库管理器110也可以从事实库114去除不需要的事实,诸如满足预定的引起反对的内容标准的事实。
事实库114存储从位于文档主机102中的多个文档提取的事实信息。换句话说,事实库114是事实信息的数据库。可以被提取特定事实的文档是那个事实的源文档(或者“来源”)。换句话说,事实的来源包括在其内容中的那个事实。源文档可以不限定地包括网页。在事实库114中,事实库114可能已存储其事实信息的实体、概念等被表示为对象。对象可以具有与其相关联的一个或多个事实。每个对象是事实的集合。在一些实施例中,没有相关联的事实的对象(空对象)可以被看作事实库114中不存在的对象。在每个对象中,与所述对象相关联的每个事实被存储为属性值对。每个事实也包括源文档(在其内容中包括事实,并且从其提取事实)的列表。下面结合图3说明关于对象和在事实库中的事实的另外的细节。
事实索引112向事实库114提供索引,并且促进在事实库114中高效查找信息。事实索引112可以根据一个或多个参数来对事实库114作索引。例如,事实索引112可以具有将唯一的项目(例如词语、数字等)映射到在事实库114中的记录或者位置的索引。更具体地,事实索引112可以包括表目,其将在事实库的每个对象名称、事实属性和事实值中的每个项目映射到在事实库内的记录或者位置。
应当明白,事实库引擎106的每个部件可以分布在多个计算机上。例如,事实库114可以被部署在N个服务器上,诸如“模N”函数的映射函数被用于确定哪些事实被存储在N个服务器的每一个中。类似地,事实索引112可以分布在多个服务器上,并且导入器108和库管理器110都可以分布在多个计算机上。但是,为便于说明,我们将讨论事实库引擎106的部件,就好像它们被实现在单个计算机上。
图2是图解按照本发明的一些实施例的用于学习事实的过程的流程图。识别具有可被识别为属性值对(以下称为“A-V对”)的一个或多个事实的对象(202)。下面结合图3进一步详细说明对象和A-V对。所识别的对象可以是在事实库中的对象。在与对象相关联的A-V对中有一个或多个种子A-V对(种子事实)。
识别与对象相关联的文档(204)。可以通过使用对象名称作为搜索项而执行搜索来进行文档识别。在一些实施例中,搜索可以是对于包括对象名称的、经由万维网可访问的文档的搜索。换句话说,执行对于匹配对象名称的文档的万维网搜索。可以使用诸如万维网搜索引擎之类的搜索引擎来执行搜索。如果对象具有多个名称(如以下结合图3所述),则在一些实施例中,可以将名称之一(例如主要名称)用作搜索项。
所述种子A-V对可以是与所识别的对象相关联的所有的A-V对,或者所述种子A-V对可以是对所述对象所识别的A-V对的子集。换句话说,所识别的对象具有一组一个或多个A-V对,并且对象的种子A-V对至少是那组一个或多个A-V对的子集。与对象相关联的哪些A-V对是种子A-V对可以基于预定的标准。例如,种子A-V对可以是在其来源列表中具有多个列出的来源的A-V对。作为另一个示例,所述种子A-V对可以是其置信值超过预定置信阈值的A-V对。更一般而言,所述种子A-V对可以是被认为可靠的A-V对。
选择所识别的文档之一(206),并且检验所述文档,以确定在其内容中是否具有种子A-V对的至少第一预定数量(在图2中为“M”)的不同值。换句话说,对于所选择的文档执行有效性检验。一个有效性要求是所述文档必须具有在文档中的种子A-V对的至少M个不同的值。为了方便,以下将种子A-V对的值称为“种子值”。在一些实施例中,M为2,而在其它实施例中,M是大于2的整数。在一些实施例中,所述有效性要求可以是所述文档具有对应于M个不同的种子A-V对的M个不同事实。
在一些实施例中,附加的有效性要求还可以包括是否在文档中包括的种子值在文档中彼此接近或者远离、是否种子值位于文档的同一区域(例如在网页中的同一框架)中和是否在具有种子值的文件中的A-V对具有类似的HTML标记。
如果文档因为它不包括至少M个种子值并且/或者未满足其它的有效性要求而无效(208-否),并且如果存在其它的等待验证的文档(224-否),则可以选择另一个文档来验证(206)。如果没有要验证的更多的文档(224-是),则所述过程结束(206)。
如果所选择的文档是有效的(208-是),则识别围绕具有种子值的内容的一个或多个上下文模式(210)。所述上下文模式是包括种子值的内容和附近的内容的可视结构,其为种子值提供上下文。例如,所述上下文模式可以是表格或者列表。在一些实施例中,可以通过识别与具有种子值的内容和靠近该种子值的内容相关联的HTML标记而识别所述上下文模式。所述HTML标记定义如何通过客户应用提供内容以呈现给用户;所述HTML标记定义了内容的可视结构。例如,可以在属性和具有HTML标记的相关联的值的列表中提供种子值:
<b>Name:</b>Marilyn Monroe<br>
<b>Born:</b>June 1,1926<br>
<b>Died:</b>August 5,1962<br>
其中,“<b>”和“</b>”标签指定在标签之间的文本须以粗体展现,而“<br>”标签在列表的连续的表目之间插入换行符。
在一些实施例中,对于在文件中包括的种子值可能会识别出多个上下文模式。在一些情况下,不是在文档中的所有的种子值都将具有相同的上下文模式。例如,一些种子值可以在列表中,另外一些可以在表格中。因此,对于在文档中的一些种子值可以识别出一种上下文模式,而对于在文档中的其它种子值可以识别出另一种上下文模式。更一般而言,可以识别出一个或多个上下文模式,其中每个围绕至少一个种子值。
在一些实施例中,可以通过产生文档的HTML标签树来促进上下文模式的识别。HTML标签树是映射到文档中的HTML标签的嵌套结构的树型数据结构。通过产生HTML标签树并且确定具有该种子值的内容在树中的何处,可以识别构成内容的上下文模式的HTML标记。
识别在文档中的所识别的单个上下文模式(或者多个模式)的其它实例(212)。这包括在该文档中搜索对于所识别的上下文模式(或者模式)的匹配。HTML标签树可以用于找到具有匹配的上下文模式的内容。例如,如果上下文模式是“<b>属性:</b>值<br>”,则其它示例可以是在近处出现的“<b>属性:</b>值<br>”(例如在同一列表中的其它项目)。作为另一个示例,如果所述上下文模式是表格,则所述其它实例可以是在与包括种子值的表格相同的表格中的其它表目。在一些实施例中,所识别的上下文模式的所识别的附加实例是与该上下文模式不同的实例,用于表示彼此不同的事实和与由种子A-V对表示的事实不同的事实。
如果匹配上下文模式的所识别的其它实例的数量不是至少第二预定数量(在图2中为“N”)(214-否),则对于所选定和验证的文档的处理结束。在一些实施例中,N是2,而在其它实施例中,N是大约2的整数。如果存在要验证的任何其它文档(224-否),则可以选择另一个文档来验证和处理(206)。如果没有要验证的更多的文档(224-是),则过程结束(226)。
在一些实施例中,匹配上下文模式的N个实例不包括从其识别上下文模式的、与种子值相关联的实例。换句话说,对于文档检验其是否除了与在文档中包括的种子值相关联的内容的实例之外,还具有匹配所述上下文模式的内容的N个附加的实例。在一些其它实施例中,匹配上下文模式的N个实例包括与种子值相关联的实例。即,从其识别上下文模式的、与种子值相关联的所述一个或多个实例可以作为所述N个实例的一部分被包括。而且,在一些实施例中,匹配上下文模式的内容的附加实例必须在文档中彼此接近;所述实例在文档中彼此是连续的,或者至多在预定的距离内。
在一些实施例中,如果在210识别了多个上下文模式,则在214的决断可以是所述文档是否包括所识别的至少一个上下文模式的至少N个实例。如果没有上下文模式具有匹配在文档中的上下文模式的N个实例(214-否),则对于该文档的处理结束。如果对于所识别的上下文模式的至少一个在文件中存在至少N个匹配实例(214-是),则可以从具有至少N个匹配实例的所识别的上下文模式的每个提取可被识别为A-V对的事实,如下所述。
如果文档确实具有匹配所述单个上下文模式(或者多个模式)的内容的至少N个附加实例(214-是),则从匹配所述上下文模式的其它实例识别和提取可被识别为A-V对的事实(216)。所提取的A-V对可以是所述对象的新的A-V对或者已经与对象相关联(预先存在的A-V对)并且被存储在事实库114中的A-V对。对于预先存在的A-V对,所述A-v对不再被存储在事实库114中,而是更新在事实库114中的该A-V对的来源列表(218)。来源列表(其进一步的细节如下结合图3所述)列出了在其内容中包括由所述A-V对表示的事实的文档。新的A-V对被合并到所述对象中(220),并且被存储在事实库114中。被合并到所述对象中的每个新的A-V对也包括来源列表。
可以对于每个A-V对确定置信值(222)。在一些实施例中,所述置信值仅仅是在其内容中包括所述A-V对的文档的计数。换句话说,其是在所述A-V对的来源列表中列出的来源的数量。在一些其它的实施例中,所述置信值可以是由每个源文档的页面重要性量度加权的、包括所述A-V对的来源的计数。换句话说,所述置信值是:
更一般而言,所述置信值可以基于在来源的列表中的米源的数量和其它因素。
在已经从内容的附加实例提取并且处理A-V对后,如果存在与要验证的对象相关联的其它文档(224-否),则选择另一个文档(206)。否则(224-是),所述过程结束(226)。但是,应当理解可以在另一个时间执行所述过程以学习附加事实或者验证与对象相关联的事实。可以从被合并到对象中的A-V对(如上所述)和如上所述在所述过程的开始就已经与对象相关联的事实中提取用于所述过程的后续执行的种子事实。即,新的A-V对以及预先存在的A-V对可以被用作用于所述过程的后续执行的种子A-V对。可以在需要时或者以预定的间隔执行所述过程。而且,可以对事实库中的其它对象执行所述过程。
图3图解了按照本发明的一些实施例的、在事实库114中的对象的示例数据结构。如上所述,事实库114包括多个对象,其中每个可以包括一个或多个事实。每个对象300包括唯一标识符,诸如对象ID302。对象300包括一个或多个事实304。每个事实304包括用于那个事实的唯一标识符,诸如事实ID 310。每个事实304包括属性312和值314。例如,在表示乔治华盛顿的对象中包括的事实可以包括具有“出生日期”和“去世日期”的属性的事实,这些事实的值将分别是实际的出生日期和去世日期。事实304可以包括到另一个对象的链接316,其是对象标识符,诸如在事实库114中的另一个对象的对象ID 302。链接316允许对象具有其值是其它对象的事实。例如,对于对象“美国”,可以有具有属性“总统”的事实,其值是“乔治·W·布什”,而“乔治·W·布什”是在事实库114中的另一个对象。在一些实施例中,值字段314存储所链接的对象的名称,链接316存储所链接的对象的对象标识符。在一些其它实施例中,事实304不包括链接字段316,因为事实304的值314可以存储到另一个对象的链接。
每个事实304也可以包括一个或多个量度318。所述量度可以提供事实的质量的指示。在一些实施例中,所述量度包括置信级和重要性级。置信级指示所述事实为正确的可能性。与同一对象的其它事实相比较,所述重要性级指示所述事实与对象的关联性。重要性级可以可选地被看作事实对于由对象表示的实体或者概念的理解如何重要的测量。
每个事实304包括来源列表320,所述来源包括所述事实,并且从其提取所述事实。可以通过统一资源定位符(URL)或者万维网地址或者任何其它适当形式的标识和/或定位(诸如唯一文档标识符)来识别每个来源。
在一些实施例中,一些事实可以包括代理字段322,其识别提取所述事实的模块。例如,所述代理可以是:专门的模块,其从特定来源(例如特定网站或者同族网站的页面)或者来源类型(例如提供表格形式的事实信息的网页)提取事实;或者,从在万维网上的文档中的自由文本提取事实的模块,等等。
在一些实施例中,对象300可以具有一个或多个专门的事实,诸如名称事实306和特性事实308。名称事实306是传送由对象300表示的实体或者概念的名称的事实。例如,对于表示国家西班牙的对象,可能有表达为“西班牙”的对象名称的事实。作为一般事实304的特殊实例,名称事实306包括与任何其它事实304相同的参数;它具有属性、值、事实ID、量度、来源等。名称事实306的属性324指示所述事实是名称事实,并且所述值是实际名称。所述名称可以是字符串。对象300可以具有一个或多个名称事实,因为许多实体或者概念具有多个名称。例如,表示西班牙的对象具有名称事实,用于表达所述国家的通用名称“西班牙”和官方名称“西班牙王国”。作为另一个示例,用于表示美国专利和商标局的对象可以具有名称事实,用于表达所述机构的缩写“PTO”和“USPTO”和官方名称“美国专利和商标局”。如果一个对象确实具有多个名称事实,则其中一个名称事实可以被指定为主要名称,并且其它的名称事实可以被指定为第二名称。
特性事实308是表达关于由可能感兴趣的对象300所表示的实体或者概念的陈述的事实。例如,对于表示西班牙的对象,特性事实可以表达西班牙是在欧洲的国家。作为一般事实304的特殊实例,特性事实308也包括与其它事实304相同的参数(诸如属性、值、事实ID等)。特性事实308的属性字段326表示所述事实是特性事实,并且值字段是表达所感兴趣的陈述的文本串。例如,对于表示西班牙的对象,特性事实的值可以是文本串“是在欧洲的国家”。一些对象300可以具有一个或多个特性事实,而其它对象可以没有特性事实。
应当明白在图3中所图解的和如上所述的数据结构仅仅是示例性的。事实库114的数据结构可以采用其它形式。其它字段可以被包括在事实中,并且如上所述的一些字段可以被省略。另外,除了名称事实和特性事实之外,每个对象还可以具有另外的特殊事实,诸如表达用于将由对象表示的实体或者概念分类的类型或者类别(例如人、位置、电影、演员、组织等)的事实。在一些实施例中,可以将对象的一个或多个名称和/或属性表示为具有与关联于对象的属性值对的一般事实记录304不同的格式的特殊记录。
图4是图解按照本发明的一些实施例的事实学习系统400的框图。所述系统400通常包括一个或多个处理单元(CPU)402、一个或多个网络或者其它通信接口410、存储器412和用于互连这些部件的一个或多个通信总线414。系统400可选地可以包括用户界面404,其包括显示器406、键盘408和诸如鼠标、跟踪球或者触敏板的指针器件409。存储器412包括高速随机存取存储器,诸如DRAM、SRAM、DDR RAM或者其它随机存取固态存储器;并且,存储器412可以包括非易失性存储器,诸如一个或多个磁盘存储器、光盘存储器、快闪存储器或者其它非易失性固态存储器。存储器412可以可选地包括一个或多个远离CPU 402的一个或多个存储器。在一些实施例中,存储器412存储下面的程序、模块和数据结构或者其子集:
◆操作系统416,其包括用于处理各种基本系统服务和用于执行硬件相关的任务的规程;
◆网络通信模块(或者指令)418,其用于将事实学习系统400经由一个或多个诸如因特网、其它广域网、局域网、城域网等的通信网络接口410(有线或者无线)连接到其它计算机。
◆事实存储接口(或者指令)420,其用于将事实学习系统400连接到事实存储系统436(其可以包括事实索引和事实库和/或其它适当的数据结构);
◆对象访问模块(或者指令)422,用于访问在事实存储系统436中存储的对象和相关联的事实;
◆文档识别模块(或者指令)424,用于识别与对象相关联的文档,并且识别在所述文档内的种子事实;
◆模式识别模块(或者指令)426,用于识别与在文档中的事实相关联的上下文模式;
◆模式匹配模块(或者指令)428,用于在匹配上下文模式的文档中找到内容实例;
◆事实提取模块(或者指令)430,用于从文档提取事实,向对象中合并新的事实,并且更新文档列表;以及
◆置信模块432,用于确定事实的置信值。
在一些实施例中,系统400的存储器412包括事实索引,而不是连到事实索引的接口420。系统400也包括用于存储事实的事实存储系统436。如上所述,在一些实施例中,在事实存储系统436中存储的每个事实包括从其提取相应的事实的对应的来源列表。系统400也可以包括搜索引擎434,用于搜索文档,并且/或者用于在事实存储系统中搜索事实。但是,在其它实施例中,从源文档提取事实并且将其加到事实存储系统436的“后端系统”可以是与包括用于搜索事实存储系统的搜索引擎的“前端”完全不同的系统。所述前端系统不是本文档的主题,其可以接收由后端系统建立的事实库和事实索引的复件。
应当明白,至少一些如上所述的模块可以被一起编组作为一个模块。例如,模块426和428可以被编组为一个模式模块。
每个上述的元素可以被存储在一个或多个前述的存储器中,并且对应于用于执行如上所述的功能的指令集。上述的模块或者程序(即指令集)不必被实现为独立的软件程序、规程或者模块,因此在各个实施例中,这些模块的各个子集可以被组合或者重新布置。在一些实施例中,存储器412可以存储如上所述的模块和数据结构的子集。而且,存储器412可以存储上面未说明的附加模块和数据结构。
虽然图4示出了“事实学习系统”,相对于在此所述的实施例的结构示意图,图4更被试图作为可以在一组服务器中提供的各种特征的功能说明。在实践中,并且如本领域内的技术人员所认识到的,可以独立地组合所示的项目,并且可以分离一些项目。例如,在图4中分离显示的一些项目可以被实现在单个服务器上,并且单个项目可以被一个或多个服务器实现。用于实现事实学习系统的服务器的实际数量以及如何在它们之间分配特征根据实现方式会有所不同,并且可能部分地依赖于系统在高峰使用时段以及在平均的使用时段所必须处理的数据传输量,并且还可能依赖于事实库的大小和每个服务器可以有效地处理的事实信息量。
已经参考特定实施例而描述了用于解释的上述说明。但是,上述的说明性讨论并不意在是穷举的或者将本发明限定到所公开的精确的形式。根据上述的教导,有可能进行许多修改和变化。所述实施例被选择和描述以便最佳地解释本发明的原理及其实际应用,由此使得其它本领域内的技术人员能够最佳地利用具有适合于所考虑的特定使用的各种修改的本发明和各个实施例。

Claims (19)

1.一种学习事实的方法,包括:
访问具有名称和一个或多个种子属性值对的对象;
识别与所述对象名称相关联的一组文档,在所述组中的每个文档具有所述对象的至少第一预定数量的独立种子属性值对;
对于在所识别的组中的每个文档:
在所述文档中识别与在所述文档中的相应种子属性值对相关联的上下文模式;
确认所述文档包括至少第二预定数量的与所述上下文模式匹配的附加内容实例;并且,
当所述确认成功时,从匹配所述上下文模式的相应内容实例提取属性值对,并且将所提取的属性值对合并到所述对象中。
2.如权利要求1所述的方法,还包括:对于与在所述文档中的所述上下文模式匹配的一个或多个内容实例,重复所述提取和合并操作。
3.如权利要求1所述的方法,其中,所提取和合并的属性值对与所述对象的所有其它的属性值对不同。
4.如权利要求1所述的方法,还包括:
识别在所述文档中的与所述对象的相应的属性值对匹配的属性值对;并且
将所述文档的标识符加到与所述对象的所述相应的属性值对相关联的文档的列表中。
5.如权利要求4所述的方法,还包括:对于基于在与所述属性值对相关联的文档列表中的文档的所述对象的每个属性值对,生成置信值。
6.如权利要求4所述的方法,还包括:对于与在与所述属性值对相关联的文档列表中的多个文档相对应的所述对象的每个属性值对,生成置信值。
7.一种用于学习事实的系统,包括:
一个或多个模块,具有用于下列功能的指令:
访问具有名称和一个或多个种子属性值对的对象;
识别与所述对象名称相关联的一组文档,在所述组中的每个文档具有所述对象的至少第一预定数量的独立种子属性值对;
对于在所识别的组中的每个文档:
在所述文档中识别与在所述文档中的相应种子属性值对相关联的上下文模式;以及
确认所述文档包括至少第二预定数量的与所述上下文模式匹配的附加内容实例;以及
从匹配所述上下文模式的相应内容实例提取属性值对,并且将所提取的属性值对合并到所述对象中。
8.如权利要求7所述的系统,其中所述一个或多个模块包括用于下述功能的指令:从与在所述文档中的所述上下文模式匹配的内容实例重复地提取和合并属性值对。
9.如权利要求7所述的系统,其中所提取和合并的属性值对与所述对象的所有其它的属性值对不同。
10.如权利要求7所述的系统,其中所述一个或多个模块包括用于下述功能的指令:
识别在所述文档中的与所述对象的相应属性值对匹配的属性值对;以及
将所述文档的标识符加到与所述对象的所述相应属性值对相关联的文档的列表中。
11.如权利要求10所述的系统,还包括用于下述功能的指令:对于基于在与所述属性值对相关联的文档列表中的文档的所述对象的每个属性值对,生成置信值。
12.如权利要求10所述的系统,还包括用于下述功能的指令:对于与在与所述属性值对相关联的文档列表中的多个文档相对应的所述对象的每个属性值对,生成置信值。
13.一种计算机程序产品,用于与计算机系统结合使用,所述计算机程序产品包括计算机可读存储介质和在其中嵌入的计算机程序机制,所述计算机程序机制包括用于下述功能的指令:
访问具有名称和一个或多个种子属性值对的对象;
识别与所述对象名称相关联的一组文档,在所述组中的每个文档具有所述对象的至少第一预定数量的独立种子属性值对;
对于在所识别的组中的每个文档:
在所述文档中识别与在所述文档中的相应种子属性值对相关联的上下文模式;
确认所述文档包括至少第二预定数量的与所述上下文模式匹配的附加内容实例;以及
当所述确认成功时,从匹配所述上下文模式的相应内容实例提取属性值对,并且将所提取的属性值对合并到所述对象中。
14.如权利要求13所述的计算机程序产品,还包括:对于与在所述文档中的所述上下文模式匹配的一个或多个内容实例,重复所述提取和合并操作。
15.如权利要求13所述的计算机程序产品,其中所提取和合并的属性值对与所述对象的所有其它的属性值对不同。
16.如权利要求13所述的计算机程序产品,还包括用于下述功能的指令:
识别在所述文档中的与所述对象的相应属性值对匹配的属性值对;以及
将所述文档的标识符加到与所述对象的所述相应属性值对相关联的文档的列表中。
17.如权利要求16所述的计算机程序产品,还包括用于下述功能的指令:对于基于在与所述属性值对相关联的所述文档列表中的文档的所述对象的每个属性值对,生成置信值。
18.如权利要求16所述的计算机程序产品,还包括用于下述功能的指令:对于与在与所述属性值对相关联的所述文档列表中的多个文档相对应的所述对象的每个属性值对,生成置信值。
19.一种用于学习事实的系统,包括:
用于访问具有名称和一个或多个种子属性值对的对象的装置;
用于识别与所述对象名称相关联的一组文档的装置,其中在所述组中的每个文档具有所述对象的至少第一预定数量的独立种子属性值对;
用于下述功能的装置,对于在所识别的组中的每个文档:
在所述文档中识别与在所述文档中的相应种子属性值对相关联的上下文模式;
确认所述文档包括至少第二预定数量的与所述上下文模式匹配的附加内容实例;并且,
当所述确认成功时,从匹配所述上下文模式的相应内容实例提取属性值对,并且将所提取的属性值对合并到所述对象中。
CN2006800280576A 2005-05-31 2006-05-18 从半结构化的文本学习事实 Active CN101253498B (zh)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US11/142,853 2005-05-31
US11/142,853 US7769579B2 (en) 2005-05-31 2005-05-31 Learning facts from semi-structured text
PCT/US2006/019807 WO2006132793A2 (en) 2005-05-31 2006-05-18 Learning facts from semi-structured text

Publications (2)

Publication Number Publication Date
CN101253498A true CN101253498A (zh) 2008-08-27
CN101253498B CN101253498B (zh) 2010-12-08

Family

ID=37309560

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2006800280576A Active CN101253498B (zh) 2005-05-31 2006-05-18 从半结构化的文本学习事实

Country Status (5)

Country Link
US (4) US7769579B2 (zh)
EP (1) EP1891557A2 (zh)
CN (1) CN101253498B (zh)
CA (1) CA2610208C (zh)
WO (1) WO2006132793A2 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102200983A (zh) * 2010-03-25 2011-09-28 日电(中国)有限公司 属性提取装置和方法
CN102662986A (zh) * 2012-01-13 2012-09-12 中国科学院计算技术研究所 微博消息检索系统与方法
CN105488105A (zh) * 2015-11-19 2016-04-13 百度在线网络技术(北京)有限公司 信息提取模板的建立方法、知识数据的处理方法和装置

Families Citing this family (106)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8244689B2 (en) 2006-02-17 2012-08-14 Google Inc. Attribute entropy as a signal in object normalization
US7769579B2 (en) 2005-05-31 2010-08-03 Google Inc. Learning facts from semi-structured text
US9208229B2 (en) 2005-03-31 2015-12-08 Google Inc. Anchor text summarization for corroboration
US7587387B2 (en) 2005-03-31 2009-09-08 Google Inc. User interface for facts query engine with snippets from information sources that include query terms and answer terms
US8682913B1 (en) 2005-03-31 2014-03-25 Google Inc. Corroborating facts extracted from multiple sources
US7567976B1 (en) * 2005-05-31 2009-07-28 Google Inc. Merging objects in a facts database
US8996470B1 (en) 2005-05-31 2015-03-31 Google Inc. System for ensuring the internal consistency of a fact repository
US7831545B1 (en) 2005-05-31 2010-11-09 Google Inc. Identifying the unifying subject of a set of facts
US7512620B2 (en) * 2005-08-19 2009-03-31 Google Inc. Data structure for incremental search
US8260785B2 (en) 2006-02-17 2012-09-04 Google Inc. Automatic object reference identification and linking in a browseable fact repository
US7991797B2 (en) 2006-02-17 2011-08-02 Google Inc. ID persistence through normalization
US8700568B2 (en) 2006-02-17 2014-04-15 Google Inc. Entity normalization via name normalization
US9495358B2 (en) 2006-10-10 2016-11-15 Abbyy Infopoisk Llc Cross-language text clustering
US8122026B1 (en) 2006-10-20 2012-02-21 Google Inc. Finding and disambiguating references to entities on web pages
US8285697B1 (en) 2007-01-23 2012-10-09 Google Inc. Feedback enhanced attribute extraction
US8347202B1 (en) 2007-03-14 2013-01-01 Google Inc. Determining geographic locations for place names in a fact repository
US7739212B1 (en) * 2007-03-28 2010-06-15 Google Inc. System and method for updating facts in a fact repository
US8239350B1 (en) 2007-05-08 2012-08-07 Google Inc. Date ambiguity resolution
US7761473B2 (en) * 2007-05-18 2010-07-20 Microsoft Corporation Typed relationships between items
US7966291B1 (en) 2007-06-26 2011-06-21 Google Inc. Fact-based object merging
US7970766B1 (en) 2007-07-23 2011-06-28 Google Inc. Entity type assignment
US8738643B1 (en) 2007-08-02 2014-05-27 Google Inc. Learning synonymous object names from anchor texts
US7984032B2 (en) * 2007-08-31 2011-07-19 Microsoft Corporation Iterators for applying term occurrence-level constraints in natural language searching
US8812435B1 (en) * 2007-11-16 2014-08-19 Google Inc. Learning objects and facts from documents
US8346791B1 (en) 2008-05-16 2013-01-01 Google Inc. Search augmentation
US20090307183A1 (en) * 2008-06-10 2009-12-10 Eric Arno Vigen System and Method for Transmission of Communications by Unique Definition Identifiers
US8645391B1 (en) 2008-07-03 2014-02-04 Google Inc. Attribute-value extraction from structured documents
US20140142920A1 (en) 2008-08-13 2014-05-22 International Business Machines Corporation Method and apparatus for Utilizing Structural Information in Semi-Structured Documents to Generate Candidates for Question Answering Systems
US8412749B2 (en) * 2009-01-16 2013-04-02 Google Inc. Populating a structured presentation with new values
EP2416257A4 (en) * 2009-03-31 2015-04-22 Fujitsu Ltd COMPUTER-ASSISTED NAME IDENTIFICATION EQUIPMENT, NAME IDENTIFICATION METHOD, AND NAME IDENTIFICATION PROGRAM
US8434134B2 (en) 2010-05-26 2013-04-30 Google Inc. Providing an electronic document collection
US8775400B2 (en) * 2010-06-30 2014-07-08 Microsoft Corporation Extracting facts from social network messages
US8346792B1 (en) * 2010-11-09 2013-01-01 Google Inc. Query generation using structural similarity between documents
US9460207B2 (en) * 2010-12-08 2016-10-04 Microsoft Technology Licensing, Llc Automated database generation for answering fact lookup queries
US8655866B1 (en) 2011-02-10 2014-02-18 Google Inc. Returning factual answers in response to queries
US9632994B2 (en) 2011-03-11 2017-04-25 Microsoft Technology Licensing, Llc Graphical user interface that supports document annotation
US9626348B2 (en) 2011-03-11 2017-04-18 Microsoft Technology Licensing, Llc Aggregating document annotations
US9075873B2 (en) 2011-03-11 2015-07-07 Microsoft Technology Licensing, Llc Generation of context-informative co-citation graphs
US9582591B2 (en) 2011-03-11 2017-02-28 Microsoft Technology Licensing, Llc Generating visual summaries of research documents
US8719692B2 (en) 2011-03-11 2014-05-06 Microsoft Corporation Validation, rejection, and modification of automatically generated document annotations
US8768782B1 (en) 2011-06-10 2014-07-01 Linkedin Corporation Optimized cloud computing fact checking
US9087048B2 (en) 2011-06-10 2015-07-21 Linkedin Corporation Method of and system for validating a fact checking system
US9116996B1 (en) 2011-07-25 2015-08-25 Google Inc. Reverse question answering
US8782042B1 (en) * 2011-10-14 2014-07-15 Firstrain, Inc. Method and system for identifying entities
US8856640B1 (en) 2012-01-20 2014-10-07 Google Inc. Method and apparatus for applying revision specific electronic signatures to an electronically stored document
US20130246435A1 (en) * 2012-03-14 2013-09-19 Microsoft Corporation Framework for document knowledge extraction
US8819047B2 (en) 2012-04-04 2014-08-26 Microsoft Corporation Fact verification engine
US9659059B2 (en) * 2012-07-20 2017-05-23 Salesforce.Com, Inc. Matching large sets of words
US9619458B2 (en) 2012-07-20 2017-04-11 Salesforce.Com, Inc. System and method for phrase matching with arbitrary text
US20140052647A1 (en) * 2012-08-17 2014-02-20 Truth Seal Corporation System and Method for Promoting Truth in Public Discourse
EP2916238A4 (en) * 2012-10-19 2016-06-15 Rakuten Inc CORPUS CREATIVE DEVICE, CORPUSED CREATION PROCESS AND CORPUSED CREATING PROGRAM
US9870554B1 (en) 2012-10-23 2018-01-16 Google Inc. Managing documents based on a user's calendar
US9529916B1 (en) 2012-10-30 2016-12-27 Google Inc. Managing documents based on access context
US11308037B2 (en) 2012-10-30 2022-04-19 Google Llc Automatic collaboration
US9483159B2 (en) 2012-12-12 2016-11-01 Linkedin Corporation Fact checking graphical user interface including fact checking icons
US9495341B1 (en) 2012-12-18 2016-11-15 Google Inc. Fact correction and completion during document drafting
US9384285B1 (en) 2012-12-18 2016-07-05 Google Inc. Methods for identifying related documents
US9235626B2 (en) 2013-03-13 2016-01-12 Google Inc. Automatic generation of snippets based on context and user interest
US10713261B2 (en) 2013-03-13 2020-07-14 Google Llc Generating insightful connections between graph entities
US9224103B1 (en) 2013-03-13 2015-12-29 Google Inc. Automatic annotation for training and evaluation of semantic analysis engines
US10810193B1 (en) 2013-03-13 2020-10-20 Google Llc Querying a data graph using natural language queries
US9235653B2 (en) 2013-06-26 2016-01-12 Google Inc. Discovering entity actions for an entity graph
US9342622B2 (en) 2013-06-27 2016-05-17 Google Inc. Two-phase construction of data graphs from disparate inputs
US9514113B1 (en) 2013-07-29 2016-12-06 Google Inc. Methods for automatic footnote generation
US9842113B1 (en) 2013-08-27 2017-12-12 Google Inc. Context-based file selection
US10169424B2 (en) 2013-09-27 2019-01-01 Lucas J. Myslinski Apparatus, systems and methods for scoring and distributing the reliability of online information
US20150095320A1 (en) 2013-09-27 2015-04-02 Trooclick France Apparatus, systems and methods for scoring the reliability of online information
US9785696B1 (en) 2013-10-04 2017-10-10 Google Inc. Automatic discovery of new entities using graph reconciliation
WO2015051480A1 (en) 2013-10-09 2015-04-16 Google Inc. Automatic definition of entity collections
US9798829B1 (en) 2013-10-22 2017-10-24 Google Inc. Data graph interface
US10002117B1 (en) 2013-10-24 2018-06-19 Google Llc Translating annotation tags into suggested markup
US9529791B1 (en) 2013-12-12 2016-12-27 Google Inc. Template and content aware document and template editing
US9659056B1 (en) 2013-12-30 2017-05-23 Google Inc. Providing an explanation of a missing fact estimate
RU2586577C2 (ru) 2014-01-15 2016-06-10 Общество с ограниченной ответственностью "Аби ИнфоПоиск" Фильтрация дуг в синтаксическом графе
US9972055B2 (en) 2014-02-28 2018-05-15 Lucas J. Myslinski Fact checking method and system utilizing social networking information
US9643722B1 (en) 2014-02-28 2017-05-09 Lucas J. Myslinski Drone device security system
US8990234B1 (en) * 2014-02-28 2015-03-24 Lucas J. Myslinski Efficient fact checking method and system
US9703763B1 (en) 2014-08-14 2017-07-11 Google Inc. Automatic document citations by utilizing copied content for candidate sources
US9189514B1 (en) 2014-09-04 2015-11-17 Lucas J. Myslinski Optimized fact checking method and system
US20160078364A1 (en) * 2014-09-17 2016-03-17 Microsoft Corporation Computer-Implemented Identification of Related Items
US9672251B1 (en) * 2014-09-29 2017-06-06 Google Inc. Extracting facts from documents
US9626358B2 (en) 2014-11-26 2017-04-18 Abbyy Infopoisk Llc Creating ontologies by analyzing natural language texts
US20160162576A1 (en) * 2014-12-05 2016-06-09 Lightning Source Inc. Automated content classification/filtering
US9594554B2 (en) * 2015-07-30 2017-03-14 International Buisness Machines Corporation Extraction and transformation of executable online documentation
US10318564B2 (en) 2015-09-28 2019-06-11 Microsoft Technology Licensing, Llc Domain-specific unstructured text retrieval
US10354188B2 (en) 2016-08-02 2019-07-16 Microsoft Technology Licensing, Llc Extracting facts from unstructured information
US10268965B2 (en) 2015-10-27 2019-04-23 Yardi Systems, Inc. Dictionary enhancement technique for business name categorization
US10275841B2 (en) 2015-10-27 2019-04-30 Yardi Systems, Inc. Apparatus and method for efficient business name categorization
US10275708B2 (en) 2015-10-27 2019-04-30 Yardi Systems, Inc. Criteria enhancement technique for business name categorization
US10274983B2 (en) 2015-10-27 2019-04-30 Yardi Systems, Inc. Extended business name categorization apparatus and method
US11216718B2 (en) 2015-10-27 2022-01-04 Yardi Systems, Inc. Energy management system
US10346448B2 (en) 2016-07-13 2019-07-09 Google Llc System and method for classifying an alphanumeric candidate identified in an email message
US11568274B2 (en) * 2016-08-05 2023-01-31 Google Llc Surfacing unique facts for entities
RU2640718C1 (ru) 2016-12-22 2018-01-11 Общество с ограниченной ответственностью "Аби Продакшн" Верификация атрибутов информационных объектов
US10255271B2 (en) * 2017-02-06 2019-04-09 International Business Machines Corporation Disambiguation of the meaning of terms based on context pattern detection
US10572601B2 (en) 2017-07-28 2020-02-25 International Business Machines Corporation Unsupervised template extraction
WO2019090318A1 (en) 2017-11-06 2019-05-09 Cornell University Verifying text summaries of relational data sets
US11144530B2 (en) * 2017-12-21 2021-10-12 International Business Machines Corporation Regulating migration and recall actions for high latency media (HLM) on objects or group of objects through metadata locking attributes
US11194832B2 (en) * 2018-09-13 2021-12-07 Sap Se Normalization of unstructured catalog data
JP6998282B2 (ja) * 2018-09-19 2022-01-18 ヤフー株式会社 情報処理装置、情報処理方法、およびプログラム
US11308141B2 (en) * 2018-12-26 2022-04-19 Yahoo Assets Llc Template generation using directed acyclic word graphs
US11170064B2 (en) 2019-03-05 2021-11-09 Corinne David Method and system to filter out unwanted content from incoming social media data
US11829722B2 (en) * 2019-05-31 2023-11-28 Nec Corporation Parameter learning apparatus, parameter learning method, and computer readable recording medium
US11630849B2 (en) * 2020-02-21 2023-04-18 International Business Machines Corporation Optimizing insight generation in heterogeneous datasets
JP7197531B2 (ja) * 2020-03-19 2022-12-27 ヤフー株式会社 情報処理装置、情報処理システム、情報処理方法、およびプログラム
US11443101B2 (en) 2020-11-03 2022-09-13 International Business Machine Corporation Flexible pseudo-parsing of dense semi-structured text

Family Cites Families (299)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5010478A (en) 1986-04-11 1991-04-23 Deran Roger L Entity-attribute value database system with inverse attribute for selectively relating two different entities
US5133075A (en) 1988-12-19 1992-07-21 Hewlett-Packard Company Method of monitoring changes in attribute values of object in an object-oriented database
US5440730A (en) 1990-08-09 1995-08-08 Bell Communications Research, Inc. Time index access structure for temporal databases having concurrent multiple versions
CA2048306A1 (en) 1990-10-02 1992-04-03 Steven P. Miller Distributed configuration profile for computing system
US5347653A (en) 1991-06-28 1994-09-13 Digital Equipment Corporation System for reconstructing prior versions of indexes using records indicating changes between successive versions of the indexes
US5694590A (en) 1991-09-27 1997-12-02 The Mitre Corporation Apparatus and method for the detection of security violations in multilevel secure databases
JPH05174020A (ja) 1991-12-26 1993-07-13 Okinawa Nippon Denki Software Kk 日本語処理装置
US5574898A (en) 1993-01-08 1996-11-12 Atria Software, Inc. Dynamic software version auditor which monitors a process to provide a list of objects that are accessed
US7082426B2 (en) 1993-06-18 2006-07-25 Cnet Networks, Inc. Content aggregation method and apparatus for an on-line product catalog
US5519608A (en) * 1993-06-24 1996-05-21 Xerox Corporation Method for extracting from a text corpus answers to questions stated in natural language by using linguistic analysis and hypothesis generation
US5546507A (en) 1993-08-20 1996-08-13 Unisys Corporation Apparatus and method for generating a knowledge base
US5560005A (en) 1994-02-25 1996-09-24 Actamed Corp. Methods and systems for object-based relational distributed databases
US5680622A (en) 1994-06-30 1997-10-21 Borland International, Inc. System and methods for quickly detecting shareability of symbol and type information in header files
US5675785A (en) 1994-10-04 1997-10-07 Hewlett-Packard Company Data warehouse which is accessed by a user using a schema of virtual tables
JP2809341B2 (ja) * 1994-11-18 1998-10-08 松下電器産業株式会社 情報要約方法、情報要約装置、重み付け方法、および文字放送受信装置。
US5608903A (en) 1994-12-15 1997-03-04 Novell, Inc. Method and apparatus for moving subtrees in a distributed network directory
US5717911A (en) * 1995-01-23 1998-02-10 Tandem Computers, Inc. Relational database system and method with high availability compliation of SQL programs
US5793966A (en) 1995-12-01 1998-08-11 Vermeer Technologies, Inc. Computer system and computer-implemented process for creation and maintenance of online services
US5724571A (en) * 1995-07-07 1998-03-03 Sun Microsystems, Inc. Method and apparatus for generating query responses in a computer-based document retrieval system
US5717951A (en) 1995-08-07 1998-02-10 Yabumoto; Kan W. Method for storing and retrieving information on a magnetic storage medium via data blocks of variable sizes
US6006221A (en) 1995-08-16 1999-12-21 Syracuse University Multilingual document retrieval system and method using semantic vector matching
US5838979A (en) 1995-10-31 1998-11-17 Peritus Software Services, Inc. Process and tool for scalable automated data field replacement
US5701470A (en) 1995-12-08 1997-12-23 Sun Microsystems, Inc. System and method for space efficient object locking using a data subarray and pointers
US5815415A (en) 1996-01-19 1998-09-29 Bentley Systems, Incorporated Computer system for portable persistent modeling
US5802299A (en) 1996-02-13 1998-09-01 Microtouch Systems, Inc. Interactive system for authoring hypertext document collections
US5778378A (en) 1996-04-30 1998-07-07 International Business Machines Corporation Object oriented information retrieval framework mechanism
US5920859A (en) 1997-02-05 1999-07-06 Idd Enterprises, L.P. Hypertext document retrieval system and method
US5819210A (en) 1996-06-21 1998-10-06 Xerox Corporation Method of lazy contexted copying during unification
US6052693A (en) 1996-07-02 2000-04-18 Harlequin Group Plc System for assembling large databases through information extracted from text sources
US5987460A (en) 1996-07-05 1999-11-16 Hitachi, Ltd. Document retrieval-assisting method and system for the same and document retrieval service using the same with document frequency and term frequency
US5819265A (en) 1996-07-12 1998-10-06 International Business Machines Corporation Processing names in a text
US5778373A (en) 1996-07-15 1998-07-07 At&T Corp Integration of an information server database schema by generating a translation map from exemplary files
US5787413A (en) 1996-07-29 1998-07-28 International Business Machines Corporation C++ classes for a digital library
US6820093B2 (en) 1996-07-30 2004-11-16 Hyperphrase Technologies, Llc Method for verifying record code prior to an action based on the code
US5826258A (en) 1996-10-02 1998-10-20 Junglee Corporation Method and apparatus for structuring the querying and interpretation of semistructured information
US6285999B1 (en) 1997-01-10 2001-09-04 The Board Of Trustees Of The Leland Stanford Junior University Method for node ranking in a linked database
US7269587B1 (en) 1997-01-10 2007-09-11 The Board Of Trustees Of The Leland Stanford Junior University Scoring documents in a linked database
AUPO525497A0 (en) * 1997-02-21 1997-03-20 Mills, Dudley John Network-based classified information systems
US6134555A (en) 1997-03-10 2000-10-17 International Business Machines Corporation Dimension reduction using association rules for data mining application
US5822743A (en) 1997-04-08 1998-10-13 1215627 Ontario Inc. Knowledge-based information retrieval system
US5882743A (en) 1997-04-21 1999-03-16 Kimberly-Clark Worldwide, Inc. Absorbent folded hand towel
US6038560A (en) * 1997-05-21 2000-03-14 Oracle Corporation Concept knowledge base search and retrieval system
US5974254A (en) 1997-06-06 1999-10-26 National Instruments Corporation Method for detecting differences between graphical programs
US5893093A (en) * 1997-07-02 1999-04-06 The Sabre Group, Inc. Information search and retrieval with geographical coordinates
AU735024B2 (en) 1997-07-25 2001-06-28 British Telecommunications Public Limited Company Scheduler for a software system
DE69803575T2 (de) 1997-07-25 2002-08-29 British Telecomm Visualisierung in einem modularen softwaresystem
AU753202B2 (en) 1997-07-25 2002-10-10 British Telecommunications Public Limited Company Software system generation
US5909689A (en) 1997-09-18 1999-06-01 Sony Corporation Automatic update of file versions for files shared by several computers which record in respective file directories temporal information for indicating when the files have been created
US6073130A (en) * 1997-09-23 2000-06-06 At&T Corp. Method for improving the results of a search in a structured database
US6442540B2 (en) * 1997-09-29 2002-08-27 Kabushiki Kaisha Toshiba Information retrieval apparatus and information retrieval method
US6996572B1 (en) 1997-10-08 2006-02-07 International Business Machines Corporation Method and system for filtering of information entities
US6018741A (en) * 1997-10-22 2000-01-25 International Business Machines Corporation Method and system for managing objects in a dynamic inheritance tree
US6112210A (en) 1997-10-31 2000-08-29 Oracle Corporation Apparatus and method for null representation in database object storage
WO1999027556A2 (en) * 1997-11-20 1999-06-03 Xacct Technologies, Inc. Network accounting and billing system and method
US5943670A (en) 1997-11-21 1999-08-24 International Business Machines Corporation System and method for categorizing objects in combined categories
US6349275B1 (en) 1997-11-24 2002-02-19 International Business Machines Corporation Multiple concurrent language support system for electronic catalogue using a concept based knowledge representation
US6212526B1 (en) 1997-12-02 2001-04-03 Microsoft Corporation Method for apparatus for efficient mining of classification models from databases
US6094650A (en) 1997-12-15 2000-07-25 Manning & Napier Information Services Database analysis using a probabilistic ontology
FI106089B (fi) 1997-12-23 2000-11-15 Sonera Oyj Liikkuvan päätelaitteen seuranta matkaviestinjärjestelmässä
JPH11265400A (ja) 1998-03-13 1999-09-28 Omron Corp 情報処理装置および方法、ネットワークシステム、並びに記録媒体
US6044366A (en) 1998-03-16 2000-03-28 Microsoft Corporation Use of the UNPIVOT relational operator in the efficient gathering of sufficient statistics for data mining
US6078918A (en) 1998-04-02 2000-06-20 Trivada Corporation Online predictive memory
US6112203A (en) 1998-04-09 2000-08-29 Altavista Company Method for ranking documents in a hyperlinked environment using connectivity and selective content analysis
US6567846B1 (en) * 1998-05-15 2003-05-20 E.Piphany, Inc. Extensible user interface for a distributed messaging framework in a computer network
US6122647A (en) 1998-05-19 2000-09-19 Perspecta, Inc. Dynamic generation of contextual links in hypertext documents
US6742003B2 (en) 2001-04-30 2004-05-25 Microsoft Corporation Apparatus and accompanying methods for visualizing clusters of data and hierarchical cluster classifications
US6327574B1 (en) 1998-07-07 2001-12-04 Encirq Corporation Hierarchical models of consumer attributes for targeting content in a privacy-preserving manner
US6240546B1 (en) 1998-07-24 2001-05-29 International Business Machines Corporation Identifying date fields for runtime year 2000 system solution process, method and article of manufacture
US7409381B1 (en) * 1998-07-30 2008-08-05 British Telecommunications Public Limited Company Index to a semi-structured database
US6665837B1 (en) 1998-08-10 2003-12-16 Overture Services, Inc. Method for identifying related pages in a hyperlinked database
US6694482B1 (en) 1998-09-11 2004-02-17 Sbc Technology Resources, Inc. System and methods for an architectural framework for design of an adaptive, personalized, interactive content delivery system
US6470330B1 (en) 1998-11-05 2002-10-22 Sybase, Inc. Database system with methods for estimation and usage of index page cluster ratio (IPCR) and data page cluster ratio (DPCR)
FR2787957B1 (fr) * 1998-12-28 2001-10-05 Inst Nat Rech Inf Automat Procede de traitement d'une requete
US6572661B1 (en) 1999-01-11 2003-06-03 Cisco Technology, Inc. System and method for automated annotation of files
US6377943B1 (en) * 1999-01-20 2002-04-23 Oracle Corp. Initial ordering of tables for database queries
US7003719B1 (en) 1999-01-25 2006-02-21 West Publishing Company, Dba West Group System, method, and software for inserting hyperlinks into documents
US6565610B1 (en) 1999-02-11 2003-05-20 Navigation Technologies Corporation Method and system for text placement when forming maps
US6574635B2 (en) 1999-03-03 2003-06-03 Siebel Systems, Inc. Application instantiation based upon attributes and values stored in a meta data repository, including tiering of application layers objects and components
US6584464B1 (en) 1999-03-19 2003-06-24 Ask Jeeves, Inc. Grammar template query system
US6397228B1 (en) * 1999-03-31 2002-05-28 Verizon Laboratories Inc. Data enhancement techniques
US6763496B1 (en) 1999-03-31 2004-07-13 Microsoft Corporation Method for promoting contextual information to display pages containing hyperlinks
US6263328B1 (en) 1999-04-09 2001-07-17 International Business Machines Corporation Object oriented query model and process for complex heterogeneous database queries
US20030195872A1 (en) * 1999-04-12 2003-10-16 Paul Senn Web-based information content analyzer and information dimension dictionary
US6721713B1 (en) 1999-05-27 2004-04-13 Andersen Consulting Llp Business alliance identification in a web architecture framework
US6606625B1 (en) 1999-06-03 2003-08-12 University Of Southern California Wrapper induction by hierarchical data analysis
US6711585B1 (en) 1999-06-15 2004-03-23 Kanisa Inc. System and method for implementing a knowledge management system
US6438543B1 (en) 1999-06-17 2002-08-20 International Business Machines Corporation System and method for cross-document coreference
US6473898B1 (en) 1999-07-06 2002-10-29 Pcorder.Com, Inc. Method for compiling and selecting data attributes
US6873982B1 (en) * 1999-07-16 2005-03-29 International Business Machines Corporation Ordering of database search results based on user feedback
EP1072987A1 (en) * 1999-07-29 2001-01-31 International Business Machines Corporation Geographic web browser and iconic hyperlink cartography
US6341306B1 (en) * 1999-08-13 2002-01-22 Atomica Corporation Web-based information retrieval responsive to displayed word identified by a text-grabbing algorithm
CA2281331A1 (en) 1999-09-03 2001-03-03 Cognos Incorporated Database management system
US6845354B1 (en) * 1999-09-09 2005-01-18 Institute For Information Industry Information retrieval system with a neuro-fuzzy structure
US6754873B1 (en) 1999-09-20 2004-06-22 Google Inc. Techniques for finding related hyperlinked documents using link-based analysis
GB2371901B (en) 1999-09-21 2004-06-23 Andrew E Borthwick A probabilistic record linkage model derived from training data
AU2702701A (en) 1999-10-15 2001-04-23 Milind Kotwal Method of categorization and indexing of information
US6665666B1 (en) 1999-10-26 2003-12-16 International Business Machines Corporation System, method and program product for answering questions using a search engine
US6850896B1 (en) 1999-10-28 2005-02-01 Market-Touch Corporation Method and system for managing and providing sales data using world wide web
JP3888812B2 (ja) * 1999-11-01 2007-03-07 富士通株式会社 事実データ統合方法および装置
US6804667B1 (en) 1999-11-30 2004-10-12 Ncr Corporation Filter for checking for duplicate entries in database
US6963867B2 (en) 1999-12-08 2005-11-08 A9.Com, Inc. Search query processing to provide category-ranked presentation of search results
US7305380B1 (en) 1999-12-15 2007-12-04 Google Inc. Systems and methods for performing in-context searching
US6865582B2 (en) 2000-01-03 2005-03-08 Bechtel Bwxt Idaho, Llc Systems and methods for knowledge discovery in spatial data
US6606659B1 (en) 2000-01-28 2003-08-12 Websense, Inc. System and method for controlling access to internet sites
US6665659B1 (en) 2000-02-01 2003-12-16 James D. Logan Methods and apparatus for distributing and using metadata via the internet
US6567936B1 (en) 2000-02-08 2003-05-20 Microsoft Corporation Data clustering using error-tolerant frequent item sets
AU2001241564A1 (en) 2000-02-17 2001-08-27 E-Numerate Solutions, Inc. Rdl search engine
US6584646B2 (en) 2000-02-29 2003-07-01 Katoh Electrical Machinery Co., Ltd. Tilt hinge for office automation equipment
US6901403B1 (en) 2000-03-02 2005-05-31 Quovadx, Inc. XML presentation of general-purpose data sources
US6311194B1 (en) 2000-03-15 2001-10-30 Taalee, Inc. System and method for creating a semantic web and its applications in browsing, searching, profiling, personalization and advertising
US6738767B1 (en) * 2000-03-20 2004-05-18 International Business Machines Corporation System and method for discovering schematic structure in hypertext documents
US6502102B1 (en) 2000-03-27 2002-12-31 Accenture Llp System, method and article of manufacture for a table-driven automated scripting architecture
US6643641B1 (en) 2000-04-27 2003-11-04 Russell Snyder Web search engine with graphic snapshots
EP1156430A2 (en) 2000-05-17 2001-11-21 Matsushita Electric Industrial Co., Ltd. Information retrieval system
US6957213B1 (en) 2000-05-17 2005-10-18 Inquira, Inc. Method of utilizing implicit references to answer a query
US7062483B2 (en) 2000-05-18 2006-06-13 Endeca Technologies, Inc. Hierarchical data-driven search and navigation system and method for information retrieval
US7325201B2 (en) * 2000-05-18 2008-01-29 Endeca Technologies, Inc. System and method for manipulating content in a hierarchical data-driven search and navigation system
WO2001090921A2 (en) 2000-05-25 2001-11-29 Kanisa, Inc. System and method for automatically classifying text
US6487495B1 (en) 2000-06-02 2002-11-26 Navigation Technologies Corporation Navigation applications using related location-referenced keywords
US6963876B2 (en) 2000-06-05 2005-11-08 International Business Machines Corporation System and method for searching extended regular expressions
US6745189B2 (en) 2000-06-05 2004-06-01 International Business Machines Corporation System and method for enabling multi-indexing of objects
US20020042707A1 (en) 2000-06-19 2002-04-11 Gang Zhao Grammar-packaged parsing
US7162499B2 (en) * 2000-06-21 2007-01-09 Microsoft Corporation Linked value replication
GB0015233D0 (en) * 2000-06-21 2000-08-16 Canon Kk Indexing method and apparatus
MXPA03000110A (es) * 2000-06-22 2006-06-08 Mayer Yaron Sistema y metodo de investigacion para buscar y contactar citas en mensajeros instantaneos en la red y/o en otros metodos capaces de encontrar y crear un contacto inmediato.
US7003506B1 (en) * 2000-06-23 2006-02-21 Microsoft Corporation Method and system for creating an embedded search link document
US6578032B1 (en) 2000-06-28 2003-06-10 Microsoft Corporation Method and system for performing phrase/word clustering and cluster merging
US7080085B1 (en) 2000-07-12 2006-07-18 International Business Machines Corporation System and method for ensuring referential integrity for heterogeneously scoped references in an information management system
US6728728B2 (en) 2000-07-24 2004-04-27 Israel Spiegler Unified binary model and methodology for knowledge representation and for data and information mining
US6675159B1 (en) * 2000-07-27 2004-01-06 Science Applic Int Corp Concept-based search and retrieval system
US7146536B2 (en) 2000-08-04 2006-12-05 Sun Microsystems, Inc. Fact collection for product knowledge management
US7100082B2 (en) 2000-08-04 2006-08-29 Sun Microsystems, Inc. Check creation and maintenance for product knowledge management
US7080073B1 (en) 2000-08-18 2006-07-18 Firstrain, Inc. Method and apparatus for focused crawling
US6556991B1 (en) * 2000-09-01 2003-04-29 E-Centives, Inc. Item name normalization
US6823495B1 (en) 2000-09-14 2004-11-23 Microsoft Corporation Mapping tool graphical user interface
US6832218B1 (en) 2000-09-22 2004-12-14 International Business Machines Corporation System and method for associating search results
US7493308B1 (en) * 2000-10-03 2009-02-17 A9.Com, Inc. Searching documents using a dimensional database
US6684205B1 (en) * 2000-10-18 2004-01-27 International Business Machines Corporation Clustering hypertext with applications to web searching
JP2002157276A (ja) 2000-11-16 2002-05-31 Hitachi Software Eng Co Ltd 問題解決支援方法及びシステム
US20020174099A1 (en) * 2000-11-28 2002-11-21 Anthony Raj Minimal identification
US7013308B1 (en) * 2000-11-28 2006-03-14 Semscript Ltd. Knowledge storage and retrieval system and method
US8402068B2 (en) * 2000-12-07 2013-03-19 Half.Com, Inc. System and method for collecting, associating, normalizing and presenting product and vendor information on a distributed network
JP2002230035A (ja) 2001-01-05 2002-08-16 Internatl Business Mach Corp <Ibm> 情報整理方法、情報処理装置、情報処理システム、記憶媒体、およびプログラム伝送装置
US6879969B2 (en) 2001-01-21 2005-04-12 Volvo Technological Development Corporation System and method for real-time recognition of driving patterns
US6693651B2 (en) * 2001-02-07 2004-02-17 International Business Machines Corporation Customer self service iconic interface for resource search results display and selection
US7143099B2 (en) 2001-02-08 2006-11-28 Amdocs Software Systems Limited Historical data warehousing system
US7216073B2 (en) * 2001-03-13 2007-05-08 Intelligate, Ltd. Dynamic natural language understanding
US6820081B1 (en) 2001-03-19 2004-11-16 Attenex Corporation System and method for evaluating a structured message store for message redundancy
US20020147738A1 (en) 2001-04-06 2002-10-10 Reader Scot A. Method and appratus for finding patent-relevant web documents
JP4159366B2 (ja) 2001-04-12 2008-10-01 コーニンクレッカ フィリップス エレクトロニクス エヌ ヴィ ユーザ嗜好を登録するための方法とシステム
US6556610B1 (en) * 2001-04-12 2003-04-29 E20 Communications, Inc. Semiconductor lasers
US20020169770A1 (en) 2001-04-27 2002-11-14 Kim Brian Seong-Gon Apparatus and method that categorize a collection of documents into a hierarchy of categories that are defined by the collection of documents
US7020662B2 (en) 2001-05-29 2006-03-28 Sun Microsystems, Inc. Method and system for determining a directory entry's class of service based on the value of a specifier in the entry
MXPA03011976A (es) * 2001-06-22 2005-07-01 Nervana Inc Sistema y metodo para la recuperacion, manejo, entrega y presentacion de conocimientos.
US7003552B2 (en) * 2001-06-25 2006-02-21 Canon Kabushiki Kaisha Information processing apparatus and control method therefor
US7263656B2 (en) * 2001-07-16 2007-08-28 Canon Kabushiki Kaisha Method and device for scheduling, generating and processing a document comprising blocks of information
WO2003009251A1 (en) 2001-07-18 2003-01-30 Hyunjae Tech Co., Ltd System for automatic recognizing licence number of other vehicles on observation vehicles and method thereof
JP4571404B2 (ja) * 2001-07-26 2010-10-27 インターナショナル・ビジネス・マシーンズ・コーポレーション データ処理方法、データ処理システムおよびプログラム
CA2354443A1 (en) 2001-07-31 2003-01-31 Ibm Canada Limited-Ibm Canada Limitee Method and system for visually constructing xml schemas using an object-oriented model
US6868411B2 (en) * 2001-08-13 2005-03-15 Xerox Corporation Fuzzy text categorizer
US7398201B2 (en) 2001-08-14 2008-07-08 Evri Inc. Method and system for enhanced data searching
WO2003017023A2 (en) 2001-08-14 2003-02-27 Quigo Technologies, Inc. System and method for extracting content for submission to a search engine
US7386832B2 (en) 2001-08-31 2008-06-10 Siebel Systems, Inc. Configurator using structure to provide a user interface
US7058653B2 (en) 2001-09-17 2006-06-06 Ricoh Company, Ltd. Tree system diagram output method, computer program and recording medium
US7403938B2 (en) * 2001-09-24 2008-07-22 Iac Search & Media, Inc. Natural language query processing
US7020641B2 (en) 2001-10-22 2006-03-28 Sun Microsystems, Inc. Method, system, and program for maintaining a database of data objects
US7197449B2 (en) * 2001-10-30 2007-03-27 Intel Corporation Method for extracting name entities and jargon terms using a suffix tree data structure
CN100461156C (zh) * 2001-11-09 2009-02-11 无锡永中科技有限公司 集成数据处理系统
JP3931214B2 (ja) 2001-12-17 2007-06-13 日本アイ・ビー・エム株式会社 データ解析装置およびプログラム
US6965900B2 (en) 2001-12-19 2005-11-15 X-Labs Holdings, Llc Method and apparatus for electronically extracting application specific multidimensional information from documents selected from a set of documents electronically extracted from a library of electronically searchable documents
US7096231B2 (en) * 2001-12-28 2006-08-22 American Management Systems, Inc. Export engine which builds relational database directly from object model
US7209906B2 (en) * 2002-01-14 2007-04-24 International Business Machines Corporation System and method for implementing a metrics engine for tracking relationships over time
US7398461B1 (en) 2002-01-24 2008-07-08 Overture Services, Inc. Method for ranking web page search results
EP1485825A4 (en) 2002-02-04 2008-03-19 Cataphora Inc DETAILED EXPLORATION TECHNIQUE OF SOCIOLOGICAL DATA AND CORRESPONDING APPARATUS
US20030149567A1 (en) 2002-02-04 2003-08-07 Tony Schmitz Method and system for using natural language in computer resource utilization analysis via a communications network
US7421660B2 (en) 2003-02-04 2008-09-02 Cataphora, Inc. Method and apparatus to visually present discussions for data mining purposes
US20030154071A1 (en) 2002-02-11 2003-08-14 Shreve Gregory M. Process for the document management and computer-assisted translation of documents utilizing document corpora constructed by intelligent agents
US7165024B2 (en) * 2002-02-22 2007-01-16 Nec Laboratories America, Inc. Inferring hierarchical descriptions of a set of documents
JP4098539B2 (ja) 2002-03-15 2008-06-11 富士通株式会社 プロファイル情報の推薦方法、プログラム及び装置
US7043521B2 (en) * 2002-03-21 2006-05-09 Rockwell Electronic Commerce Technologies, Llc Search agent for searching the internet
JP3896014B2 (ja) 2002-03-22 2007-03-22 株式会社東芝 情報収集システム、情報収集方法及びコンピュータに情報収集を実行させるプログラム
CA2479228C (en) 2002-03-27 2011-08-09 British Telecommunications Public Limited Company Network security system
US6857053B2 (en) 2002-04-10 2005-02-15 International Business Machines Corporation Method, system, and program for backing up objects by creating groups of objects
TWI256562B (en) 2002-05-03 2006-06-11 Ind Tech Res Inst Method for named-entity recognition and verification
US6963880B1 (en) 2002-05-10 2005-11-08 Oracle International Corporation Schema evolution of complex objects
US20040015481A1 (en) * 2002-05-23 2004-01-22 Kenneth Zinda Patent data mining
US7003522B1 (en) 2002-06-24 2006-02-21 Microsoft Corporation System and method for incorporating smart tags in online content
US20040003067A1 (en) 2002-06-27 2004-01-01 Daniel Ferrin System and method for enabling a user interface with GUI meta data
US20040006748A1 (en) 2002-07-03 2004-01-08 Amit Srivastava Systems and methods for providing online event tracking
GB0215464D0 (en) 2002-07-04 2002-08-14 Hewlett Packard Co Combining data descriptions
WO2004019264A1 (en) 2002-08-22 2004-03-04 Agency For Science, Technology And Research Prediction by collective likelihood from emerging patterns
US20040059726A1 (en) * 2002-09-09 2004-03-25 Jeff Hunter Context-sensitive wordless search
US20040064447A1 (en) * 2002-09-27 2004-04-01 Simske Steven J. System and method for management of synonymic searching
US6886010B2 (en) * 2002-09-30 2005-04-26 The United States Of America As Represented By The Secretary Of The Navy Method for data and text mining and literature-based discovery
US7096217B2 (en) 2002-10-31 2006-08-22 International Business Machines Corporation Global query correlation attributes
US20050108256A1 (en) 2002-12-06 2005-05-19 Attensity Corporation Visualization of integrated structured and unstructured data
US7277879B2 (en) 2002-12-17 2007-10-02 Electronic Data Systems Corporation Concept navigation in data storage systems
US7181450B2 (en) 2002-12-18 2007-02-20 International Business Machines Corporation Method, system, and program for use of metadata to create multidimensional cubes in a relational database
US20040122846A1 (en) * 2002-12-19 2004-06-24 Ibm Corporation Fact verification system
US7107528B2 (en) 2002-12-20 2006-09-12 International Business Machines Corporation Automatic completion of dates
US7472182B1 (en) 2002-12-31 2008-12-30 Emc Corporation Data collection policy for storage devices
GB0304639D0 (en) 2003-02-28 2003-04-02 Kiq Ltd Classification using re-sampling of probability estimates
US7020666B2 (en) 2003-03-07 2006-03-28 Microsoft Corporation System and method for unknown type serialization
US7051023B2 (en) 2003-04-04 2006-05-23 Yahoo! Inc. Systems and methods for generating concept units from search queries
EP1629359A4 (en) 2003-04-07 2008-01-09 Sevenecho Llc METHOD, SYSTEM AND SOFTWARE FOR CUSTOMIZING PERSONALIZED NARRATIVE PRESENTATIONS
US20040243552A1 (en) * 2003-05-30 2004-12-02 Dictaphone Corporation Method, system, and apparatus for viewing data
US8095544B2 (en) 2003-05-30 2012-01-10 Dictaphone Corporation Method, system, and apparatus for validation
US7747571B2 (en) 2003-04-15 2010-06-29 At&T Intellectual Property, I,L.P. Methods, systems, and computer program products for implementing logical and physical data models
EP1477892B1 (en) * 2003-05-16 2015-12-23 Sap Se System, method, computer program product and article of manufacture for inputting data in a computer system
JP2004362223A (ja) 2003-06-04 2004-12-24 Hitachi Ltd 情報マイニングシステム
US7836391B2 (en) 2003-06-10 2010-11-16 Google Inc. Document search engine including highlighting of confident results
US9026901B2 (en) 2003-06-20 2015-05-05 International Business Machines Corporation Viewing annotations across multiple applications
US7162473B2 (en) 2003-06-26 2007-01-09 Microsoft Corporation Method and system for usage analyzer that determines user accessed sources, indexes data subsets, and associated metadata, processing implicit queries based on potential interest to users
US7739588B2 (en) 2003-06-27 2010-06-15 Microsoft Corporation Leveraging markup language data for semantically labeling text strings and data and for providing actions based on semantically labeled text strings and data
WO2005008358A2 (en) 2003-07-22 2005-01-27 Kinor Technologies Inc. Information access using ontologies
WO2005010727A2 (en) * 2003-07-23 2005-02-03 Praedea Solutions, Inc. Extracting data from semi-structured text documents
CA2536265C (en) * 2003-08-21 2012-11-13 Idilia Inc. System and method for processing a query
US20050055365A1 (en) * 2003-09-09 2005-03-10 I.V. Ramakrishnan Scalable data extraction techniques for transforming electronic documents into queriable archives
US7644076B1 (en) * 2003-09-12 2010-01-05 Teradata Us, Inc. Clustering strings using N-grams
US8589373B2 (en) 2003-09-14 2013-11-19 Yaron Mayer System and method for improved searching on the internet or similar networks and especially improved MetaNews and/or improved automatically generated newspapers
US8086690B1 (en) 2003-09-22 2011-12-27 Google Inc. Determining geographical relevance of web documents
US7496560B2 (en) * 2003-09-23 2009-02-24 Amazon Technologies, Inc. Personalized searchable library with highlighting capabilities
US7158980B2 (en) * 2003-10-02 2007-01-02 Acer Incorporated Method and apparatus for computerized extracting of scheduling information from a natural language e-mail
AU2003290397A1 (en) 2003-10-15 2005-04-27 Dharamdas Gautam Goradia Interactive wisdom system
JP4729844B2 (ja) * 2003-10-16 2011-07-20 富士ゼロックス株式会社 サーバ装置、情報の提供方法、及びプログラム
KR100533810B1 (ko) * 2003-10-16 2005-12-07 한국전자통신연구원 백과사전 질의응답 시스템의 지식베이스 반자동 구축 방법
US20050144241A1 (en) 2003-10-17 2005-06-30 Stata Raymond P. Systems and methods for a search-based email client
GB0325626D0 (en) 2003-11-03 2003-12-10 Infoshare Ltd Data aggregation
US20050108630A1 (en) 2003-11-19 2005-05-19 Wasson Mark D. Extraction of facts from text
US7512553B2 (en) 2003-12-05 2009-03-31 International Business Machines Corporation System for automated part-number mapping
US20050138007A1 (en) 2003-12-22 2005-06-23 International Business Machines Corporation Document enhancement method
US8150824B2 (en) 2003-12-31 2012-04-03 Google Inc. Systems and methods for direct navigation to specific portion of target document
US20050149851A1 (en) 2003-12-31 2005-07-07 Google Inc. Generating hyperlinks and anchor text in HTML and non-HTML documents
US7424467B2 (en) 2004-01-26 2008-09-09 International Business Machines Corporation Architecture for an indexer with fixed width sort and variable width sort
US7499913B2 (en) 2004-01-26 2009-03-03 International Business Machines Corporation Method for handling anchor text
WO2005083597A1 (en) 2004-02-20 2005-09-09 Dow Jones Reuters Business Interactive, Llc Intelligent search and retrieval system and method
US7756823B2 (en) 2004-03-26 2010-07-13 Lockheed Martin Corporation Dynamic reference repository
US7725498B2 (en) 2004-04-22 2010-05-25 International Business Machines Corporation Techniques for identifying mergeable data
US7260573B1 (en) 2004-05-17 2007-08-21 Google Inc. Personalizing anchor text scores in a search engine
US20050278314A1 (en) 2004-06-09 2005-12-15 Paul Buchheit Variable length snippet generation
US7716225B1 (en) 2004-06-17 2010-05-11 Google Inc. Ranking documents based on user behavior and/or feature data
US7454430B1 (en) 2004-06-18 2008-11-18 Glenbrook Networks System and method for facts extraction and domain knowledge repository creation from unstructured and semi-structured documents
US8051207B2 (en) * 2004-06-25 2011-11-01 Citrix Systems, Inc. Inferring server state in s stateless communication protocol
US20060036504A1 (en) 2004-08-11 2006-02-16 Allocca William W Dynamically classifying items for international delivery
US20060041375A1 (en) 2004-08-19 2006-02-23 Geographic Data Technology, Inc. Automated georeferencing of digitized map images
US7809695B2 (en) * 2004-08-23 2010-10-05 Thomson Reuters Global Resources Information retrieval systems with duplicate document detection and presentation functions
US20060047691A1 (en) * 2004-08-31 2006-03-02 Microsoft Corporation Creating a document index from a flex- and Yacc-generated named entity recognizer
US20060053171A1 (en) * 2004-09-03 2006-03-09 Biowisdom Limited System and method for curating one or more multi-relational ontologies
US20060053175A1 (en) * 2004-09-03 2006-03-09 Biowisdom Limited System and method for creating, editing, and utilizing one or more rules for multi-relational ontology creation and maintenance
US20060074910A1 (en) 2004-09-17 2006-04-06 Become, Inc. Systems and methods of retrieving topic specific information
JP4587756B2 (ja) 2004-09-21 2010-11-24 ルネサスエレクトロニクス株式会社 半導体集積回路装置
US20060064411A1 (en) * 2004-09-22 2006-03-23 William Gross Search engine using user intent
US7809763B2 (en) * 2004-10-15 2010-10-05 Oracle International Corporation Method(s) for updating database object metadata
US7822768B2 (en) 2004-11-23 2010-10-26 International Business Machines Corporation System and method for automating data normalization using text analytics
US9137115B2 (en) 2004-12-06 2015-09-15 Bmc Software, Inc. System and method for resource reconciliation in an enterprise management system
US20060167991A1 (en) 2004-12-16 2006-07-27 Heikes Brian D Buddy list filtering
US20060143227A1 (en) 2004-12-27 2006-06-29 Helm Martin W System and method for persisting software objects
US8719779B2 (en) 2004-12-28 2014-05-06 Sap Ag Data object association based on graph theory techniques
US7672971B2 (en) * 2006-02-17 2010-03-02 Google Inc. Modular architecture for entity normalization
US7769579B2 (en) * 2005-05-31 2010-08-03 Google Inc. Learning facts from semi-structured text
US20060149800A1 (en) 2004-12-30 2006-07-06 Daniel Egnor Authoritative document identification
US7685136B2 (en) 2005-01-12 2010-03-23 International Business Machines Corporation Method, system and program product for managing document summary information
US9208229B2 (en) 2005-03-31 2015-12-08 Google Inc. Anchor text summarization for corroboration
US7953720B1 (en) 2005-03-31 2011-05-31 Google Inc. Selecting the best answer to a fact query from among a set of potential answers
US7587387B2 (en) 2005-03-31 2009-09-08 Google Inc. User interface for facts query engine with snippets from information sources that include query terms and answer terms
US20060238919A1 (en) 2005-04-20 2006-10-26 The Boeing Company Adaptive data cleaning
US20060248456A1 (en) 2005-05-02 2006-11-02 Ibm Corporation Assigning a publication date for at least one electronic document
US20060259462A1 (en) * 2005-05-12 2006-11-16 Sybase, Inc. System and Methodology for Real-time Content Aggregation and Syndication
US7590647B2 (en) 2005-05-27 2009-09-15 Rage Frameworks, Inc Method for extracting, interpreting and standardizing tabular data from unstructured documents
US20060277169A1 (en) 2005-06-02 2006-12-07 Lunt Tracy T Using the quantity of electronically readable text to generate a derivative attribute for an electronic file
US7630977B2 (en) 2005-06-29 2009-12-08 Xerox Corporation Categorization including dependencies between different category systems
US20070005593A1 (en) 2005-06-30 2007-01-04 Microsoft Corporation Attribute-based data retrieval and association
CA2545232A1 (en) * 2005-07-29 2007-01-29 Cognos Incorporated Method and system for creating a taxonomy from business-oriented metadata content
US8666928B2 (en) * 2005-08-01 2014-03-04 Evi Technologies Limited Knowledge repository
US7797282B1 (en) 2005-09-29 2010-09-14 Hewlett-Packard Development Company, L.P. System and method for modifying a training set
US7493317B2 (en) 2005-10-20 2009-02-17 Omniture, Inc. Result-based triggering for presentation of online content
US7730013B2 (en) 2005-10-25 2010-06-01 International Business Machines Corporation System and method for searching dates efficiently in a collection of web documents
KR100755678B1 (ko) 2005-10-28 2007-09-05 삼성전자주식회사 개체명 검출 장치 및 방법
US7532979B2 (en) 2005-11-10 2009-05-12 Tele Atlas North America, Inc. Method and system for creating universal location referencing objects
US7574449B2 (en) 2005-12-02 2009-08-11 Microsoft Corporation Content matching
US8954426B2 (en) 2006-02-17 2015-02-10 Google Inc. Query language
US7555471B2 (en) 2006-01-27 2009-06-30 Google Inc. Data object visualization
US7454398B2 (en) 2006-02-17 2008-11-18 Google Inc. Support for object search
US7774328B2 (en) * 2006-02-17 2010-08-10 Google Inc. Browseable fact repository
US8260785B2 (en) 2006-02-17 2012-09-04 Google Inc. Automatic object reference identification and linking in a browseable fact repository
US7991797B2 (en) 2006-02-17 2011-08-02 Google Inc. ID persistence through normalization
US8700568B2 (en) 2006-02-17 2014-04-15 Google Inc. Entity normalization via name normalization
US8712192B2 (en) 2006-04-20 2014-04-29 Microsoft Corporation Geo-coding images
US9286404B2 (en) 2006-06-28 2016-03-15 Nokia Technologies Oy Methods of systems using geographic meta-metadata in information retrieval and document displays
US7685201B2 (en) * 2006-09-08 2010-03-23 Microsoft Corporation Person disambiguation using name entity extraction-based clustering
US8458207B2 (en) * 2006-09-15 2013-06-04 Microsoft Corporation Using anchor text to provide context
US8122026B1 (en) 2006-10-20 2012-02-21 Google Inc. Finding and disambiguating references to entities on web pages
US7698336B2 (en) 2006-10-26 2010-04-13 Microsoft Corporation Associating geographic-related information with objects
US7917154B2 (en) * 2006-11-01 2011-03-29 Yahoo! Inc. Determining mobile content for a social network based on location and time
US8108501B2 (en) * 2006-11-01 2012-01-31 Yahoo! Inc. Searching and route mapping based on a social network, location, and time
US8347202B1 (en) 2007-03-14 2013-01-01 Google Inc. Determining geographic locations for place names in a fact repository
US8316007B2 (en) * 2007-06-28 2012-11-20 Oracle International Corporation Automatically finding acronyms and synonyms in a corpus
US8812435B1 (en) 2007-11-16 2014-08-19 Google Inc. Learning objects and facts from documents
US8024281B2 (en) 2008-02-29 2011-09-20 Red Hat, Inc. Alpha node hashing in a rule engine

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102200983A (zh) * 2010-03-25 2011-09-28 日电(中国)有限公司 属性提取装置和方法
CN102662986A (zh) * 2012-01-13 2012-09-12 中国科学院计算技术研究所 微博消息检索系统与方法
CN105488105A (zh) * 2015-11-19 2016-04-13 百度在线网络技术(北京)有限公司 信息提取模板的建立方法、知识数据的处理方法和装置
CN105488105B (zh) * 2015-11-19 2019-11-05 百度在线网络技术(北京)有限公司 信息提取模板的建立方法、知识数据的处理方法和装置

Also Published As

Publication number Publication date
WO2006132793A3 (en) 2007-02-08
US20060293879A1 (en) 2006-12-28
US20140372473A1 (en) 2014-12-18
US7769579B2 (en) 2010-08-03
CA2610208C (en) 2012-07-10
CA2610208A1 (en) 2006-12-14
US20070150800A1 (en) 2007-06-28
CN101253498B (zh) 2010-12-08
US9558186B2 (en) 2017-01-31
US8825471B2 (en) 2014-09-02
US20070143317A1 (en) 2007-06-21
EP1891557A2 (en) 2008-02-27
WO2006132793A2 (en) 2006-12-14

Similar Documents

Publication Publication Date Title
CN101253498B (zh) 从半结构化的文本学习事实
US7831545B1 (en) Identifying the unifying subject of a set of facts
CN102402604B (zh) 搜索引擎的有效前向排序
CN102687138B (zh) 搜索建议聚类和呈现
CN101185074B (zh) 用于事实查询引擎的带有来自信息源的包含查询词语和回答词语的片段的用户界面
US8046681B2 (en) Techniques for inducing high quality structural templates for electronic documents
US8707167B2 (en) High precision data extraction
US9323731B1 (en) Data extraction using templates
US20090125529A1 (en) Extracting information based on document structure and characteristics of attributes
CN100514323C (zh) 用于自动提取副标题信息的系统和方法
CN102662969B (zh) 一种基于网页结构语义的互联网信息对象定位方法
CN106294535B (zh) 网站的识别方法和装置
CN103425687A (zh) 一种基于关键词的检索方法和系统
JP2010501096A (ja) ラッパー生成およびテンプレート検出の協同最適化
CN102122295A (zh) 包括确信结果的突出显示的文档搜索引擎
CN101118555A (zh) 关键词的联想信息生成系统和生成方法
CN101128820A (zh) 基于可视间隙的文档分割
US20080281827A1 (en) Using structured database for webpage information extraction
US7421416B2 (en) Method of managing web sites registered in search engine and a system thereof
US20030177115A1 (en) System and method for automatic preparation and searching of scanned documents
US11222013B2 (en) Custom named entities and tags for natural language search query processing
CN112395418B (zh) 网页中的目标对象提取方法、装置、电子设备
Biagioli et al. The NIR project: Standards and tools for legislative drafting and legal document web publication
CN109948015B (zh) 一种元搜索列表结果抽取方法及系统
CN112989142A (zh) 一种可配置化标签的处理系统、方法和装置

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CP01 Change in the name or title of a patent holder
CP01 Change in the name or title of a patent holder

Address after: American California

Patentee after: Google limited liability company

Address before: American California

Patentee before: Google Inc.