CN101253498A - 从半结构化的文本学习事实 - Google Patents
从半结构化的文本学习事实 Download PDFInfo
- Publication number
- CN101253498A CN101253498A CNA2006800280576A CN200680028057A CN101253498A CN 101253498 A CN101253498 A CN 101253498A CN A2006800280576 A CNA2006800280576 A CN A2006800280576A CN 200680028057 A CN200680028057 A CN 200680028057A CN 101253498 A CN101253498 A CN 101253498A
- Authority
- CN
- China
- Prior art keywords
- document
- property value
- value
- context pattern
- seed
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/93—Document management systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
- G06F16/353—Clustering; Classification into predefined classes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/80—Information retrieval; Database structures therefor; File system structures therefor of semi-structured data, e.g. markup language structured data such as SGML, XML or HTML
- G06F16/84—Mapping; Conversion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10—TECHNICAL SUBJECTS COVERED BY FORMER USPC
- Y10S—TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10S707/00—Data processing: database and file management or data structures
- Y10S707/953—Organization of data
- Y10S707/962—Entity-attribute-value
Abstract
本发明描述了一种用于从半结构化的文本学习或者引导事实的方法和系统。以与对象相关联的种子事实集开始,识别与所述对象相关联的文档。查看所识别的文档以确定是否每个至少具有第一预定数量的种子事实。如果文档至少具有第一预定数量的种子事实,则识别与所述种子事实相关联的上下文模式,并且识别与所述上下文模式匹配的、在文档中的其它内容实例。如果所述文档包括与所述上下文模式匹配的、至少第二预定数量的其它内容实例,则可以从所述其它实例提取事实。
Description
本申请涉及下面的申请,其中每个通过引用被包含在此:
美国专利申请第11/097,688号,“确证从多个来源提取的事实”,2005年3月31日提交;
美国专利申请第11/097,690号,“从一组可能的回答中选择对于事实查询的最佳回答”,2005年3月31日提交;
美国专利申请第11/097,689号,“具有来自包括查询项目和回答项目的信息源的片断的事实查询引擎的用户界面”,2005年3月31日提交;
美国专利申请第11/142,740号,“合并事实数据库中的对象”,2005年5月31日提交;
美国专利申请第11/142,748号,“用于保证事实库的内部一致性的系统”,2005年5月31日提交;
美国专利申请第11/142,765号,“识别一组事实的一致的主题”,2005年5月31日提交;
技术领域
所公开的实施例总体上涉及事实数据库。具体地,所公开的实施例涉及从包括在半结构化的文本中所提供的事实信息的文档学习事实。
背景技术
万维网(也被称为“Web”)和在万维网中的网页是事实信息的巨大来源。用户可以查看网页来获得对于事实问题的回答,所述事实问题诸如“波兰的首都是哪里”或者“乔治华盛顿的生日是哪天”。在网页中包括的事实信息可以被提取并存储在事实数据库中。
可以通过自动化的过程来进行从网页提取事实信息。然而,此种自动化过程并不完美。它们可能遗漏某些事实信息和/或将非事实信息误识别为事实信息并将其提取。而且,所述过程可能提取错误的事实信息,因为在网页中的信息在一开始就是错误的或者所述自动化过程误解释了网页中的信息。所遗漏的事实信息减少了事实数据库的覆盖范围,并且错误的事实降低了事实数据库的质量。
发明内容
按照本发明的一个方面,一种学习事实的方法包括:访问具有名称和一个或多个种子属性值对(seed attribute-value pair)的对象;识别与所述对象名称相关联的一组文档,在所述组中的每个文档具有所述对象的至少第一预定数量的种子属性值对;对于在所识别的组中的每个文档:在所述文档中识别与在所述文档中的种子属性值对相关联的上下文模式;确认所述文档包括至少第二预定数量的与所述上下文模式匹配的附加内容实例;并且,当所述确认成功时,从匹配所述上下文模式的相应内容实例提取属性值对,并且将所提取的属性值对合并到所述对象中。
附图说明
图1图解了按照本发明的一些实施例的网络。
图2是图解按照本发明的一些实施例的用于学习事实的过程的流程图。
图3图解了按照本发明的一些实施例的在事实库中的对象和相关联的事实的数据结构。
图4图解了按照本发明的一些实施例的文档处理系统。
在全部附图中,相同的附图标记表示对应的部分。
具体实施方式
可以通过引导过程来验证在事实库中的事实,并且发现和提取附加事实。以与对象相关联的一个或多个种子事实开始,识别与所述对象相关联并且包括至少预定数量的种子事实的文档。识别围绕这些文档中的种子事实的上下文模式。使用所述上下文模式,找到文档中的具有相同的上下文模式的其它内容。从具有同一上下文模式的其它内容识别事实。所识别的事实可以被加到事实库,或者用于验证已经在事实库中的事实。换句话说,通过引导来学习的过程使用已经在事实库中的事实来验证事实,并且找到附加事实加到事实库中。
图1图解了按照本发明的一些实施例的网络100。网络100包括一个或多个文档主机102和事实库引擎106。网络100也包括耦接这些部件的一个或多个网络104。
文档主机102存储文档,并且提供对于文档的访问。文档可以是任何机器可读的数据,其包括文本、图形、多媒体内容等的任何组合。在一些实施例中,文档可以是以超文本标记语言(HTML)所写的文本、图形和可能的其它形式的信息的组合,即网页。文档可以包括一个或多个到其它文档的超链接。文档可以包括在其内容中的一个或多个事实。统一资源定位符(URL)、或者网址或者任何其它适当形式的识别和/或定位可以定位和/或识别存储在文档主机102中的文档。每个文档也可以与页面重要性量度相关联。文档的页面重要性量度测定所述文档相对于其它文档的重要性、普及性或者声誉。在一些实施例中,所述页面重要性量度是文档的PageRank。对于关于PageRank量度及其计算的更多信息,参见例如Page et al.,“The PageRank citation ranking:Bringing order to the web,”Stanford Digital Libraries Working Paper,1998(佩奇等人,“PageRank引用排名:在网络中引入秩序”,斯坦福电子图书馆工作文件,1998);Haveliwala,“Topic-sensitive PageRank,”11th International World Wide Web Conference,Honolulu,Hawaii,May7-11,2002(哈维里瓦拉,“对主题敏感的PageRank”,第11届国际互联网大会,夏威夷火奴鲁鲁,2002年5月7日至11日);Richardson andDomingos,“The Intelligent Surfer:Probabilistic Combination of Link andContent Information in PageRank,”Vol.14,MiT Press,Cambridge,MA,2002(理查得逊和道明高斯,“聪明的冲浪者:在PageRank中链接和内容信息的概率性结合”,第14卷,MIT出版社,马萨诸塞州剑桥,2002);Jeh and Widom,“Scaling personalized web search,”12th InternationalWorld Wide Web Conference,Budapest,Hungary,May 20-24,2002(杰和威道姆,“按比例伸缩个性化网络搜索”,第12届国际互联网大会,匈牙利布达佩斯,2002年5月20日至24日);Brin and Page,“The Anatomyof a Large-Scale Hypertextual Search Engine,”7th International WorldWide Web Conference,Brisbane,Australia,April 14-18,1998(布林和佩奇,“对大规模超文本搜索引擎的剖析”,第7届国际互联网大会,澳大利亚布里斯本,1998年4月14日至18日);以及美国专利第6,285,999号,其中每个通过引用而被整体并入在此来作为背景信息。
事实库引擎106包括导入器108、库管理器110、事实索引112和事实库114。导入器108从在文档主机102上存储的文档提取事实信息。导入器108分析在文档主机102中存储的文档的内容,确定所述内容是否包括事实信息并确定与所述事实信息相关联的一个或多个主题,并且提取在所述内容中的任何可用事实信息。
库管理器110处理由导入器108提取的事实。库管理器110建立和管理事实库114和事实索引112。库管理器110接收由导入器108提取的事实,并且将它们存储在事实库114中。库管理器110也可以对事实库114中的事实执行操作,以“清理”在事实库114中的数据。例如,库管理器110可以查找事实库114以找出重复的事实(即传达完全相同的事实信息的事实),并且将其合并。库管理器110也可以将事实规范为标准格式。库管理器110也可以从事实库114去除不需要的事实,诸如满足预定的引起反对的内容标准的事实。
事实库114存储从位于文档主机102中的多个文档提取的事实信息。换句话说,事实库114是事实信息的数据库。可以被提取特定事实的文档是那个事实的源文档(或者“来源”)。换句话说,事实的来源包括在其内容中的那个事实。源文档可以不限定地包括网页。在事实库114中,事实库114可能已存储其事实信息的实体、概念等被表示为对象。对象可以具有与其相关联的一个或多个事实。每个对象是事实的集合。在一些实施例中,没有相关联的事实的对象(空对象)可以被看作事实库114中不存在的对象。在每个对象中,与所述对象相关联的每个事实被存储为属性值对。每个事实也包括源文档(在其内容中包括事实,并且从其提取事实)的列表。下面结合图3说明关于对象和在事实库中的事实的另外的细节。
事实索引112向事实库114提供索引,并且促进在事实库114中高效查找信息。事实索引112可以根据一个或多个参数来对事实库114作索引。例如,事实索引112可以具有将唯一的项目(例如词语、数字等)映射到在事实库114中的记录或者位置的索引。更具体地,事实索引112可以包括表目,其将在事实库的每个对象名称、事实属性和事实值中的每个项目映射到在事实库内的记录或者位置。
应当明白,事实库引擎106的每个部件可以分布在多个计算机上。例如,事实库114可以被部署在N个服务器上,诸如“模N”函数的映射函数被用于确定哪些事实被存储在N个服务器的每一个中。类似地,事实索引112可以分布在多个服务器上,并且导入器108和库管理器110都可以分布在多个计算机上。但是,为便于说明,我们将讨论事实库引擎106的部件,就好像它们被实现在单个计算机上。
图2是图解按照本发明的一些实施例的用于学习事实的过程的流程图。识别具有可被识别为属性值对(以下称为“A-V对”)的一个或多个事实的对象(202)。下面结合图3进一步详细说明对象和A-V对。所识别的对象可以是在事实库中的对象。在与对象相关联的A-V对中有一个或多个种子A-V对(种子事实)。
识别与对象相关联的文档(204)。可以通过使用对象名称作为搜索项而执行搜索来进行文档识别。在一些实施例中,搜索可以是对于包括对象名称的、经由万维网可访问的文档的搜索。换句话说,执行对于匹配对象名称的文档的万维网搜索。可以使用诸如万维网搜索引擎之类的搜索引擎来执行搜索。如果对象具有多个名称(如以下结合图3所述),则在一些实施例中,可以将名称之一(例如主要名称)用作搜索项。
所述种子A-V对可以是与所识别的对象相关联的所有的A-V对,或者所述种子A-V对可以是对所述对象所识别的A-V对的子集。换句话说,所识别的对象具有一组一个或多个A-V对,并且对象的种子A-V对至少是那组一个或多个A-V对的子集。与对象相关联的哪些A-V对是种子A-V对可以基于预定的标准。例如,种子A-V对可以是在其来源列表中具有多个列出的来源的A-V对。作为另一个示例,所述种子A-V对可以是其置信值超过预定置信阈值的A-V对。更一般而言,所述种子A-V对可以是被认为可靠的A-V对。
选择所识别的文档之一(206),并且检验所述文档,以确定在其内容中是否具有种子A-V对的至少第一预定数量(在图2中为“M”)的不同值。换句话说,对于所选择的文档执行有效性检验。一个有效性要求是所述文档必须具有在文档中的种子A-V对的至少M个不同的值。为了方便,以下将种子A-V对的值称为“种子值”。在一些实施例中,M为2,而在其它实施例中,M是大于2的整数。在一些实施例中,所述有效性要求可以是所述文档具有对应于M个不同的种子A-V对的M个不同事实。
在一些实施例中,附加的有效性要求还可以包括是否在文档中包括的种子值在文档中彼此接近或者远离、是否种子值位于文档的同一区域(例如在网页中的同一框架)中和是否在具有种子值的文件中的A-V对具有类似的HTML标记。
如果文档因为它不包括至少M个种子值并且/或者未满足其它的有效性要求而无效(208-否),并且如果存在其它的等待验证的文档(224-否),则可以选择另一个文档来验证(206)。如果没有要验证的更多的文档(224-是),则所述过程结束(206)。
如果所选择的文档是有效的(208-是),则识别围绕具有种子值的内容的一个或多个上下文模式(210)。所述上下文模式是包括种子值的内容和附近的内容的可视结构,其为种子值提供上下文。例如,所述上下文模式可以是表格或者列表。在一些实施例中,可以通过识别与具有种子值的内容和靠近该种子值的内容相关联的HTML标记而识别所述上下文模式。所述HTML标记定义如何通过客户应用提供内容以呈现给用户;所述HTML标记定义了内容的可视结构。例如,可以在属性和具有HTML标记的相关联的值的列表中提供种子值:
<b>Name:</b>Marilyn Monroe<br>
<b>Born:</b>June 1,1926<br>
<b>Died:</b>August 5,1962<br>
其中,“<b>”和“</b>”标签指定在标签之间的文本须以粗体展现,而“<br>”标签在列表的连续的表目之间插入换行符。
在一些实施例中,对于在文件中包括的种子值可能会识别出多个上下文模式。在一些情况下,不是在文档中的所有的种子值都将具有相同的上下文模式。例如,一些种子值可以在列表中,另外一些可以在表格中。因此,对于在文档中的一些种子值可以识别出一种上下文模式,而对于在文档中的其它种子值可以识别出另一种上下文模式。更一般而言,可以识别出一个或多个上下文模式,其中每个围绕至少一个种子值。
在一些实施例中,可以通过产生文档的HTML标签树来促进上下文模式的识别。HTML标签树是映射到文档中的HTML标签的嵌套结构的树型数据结构。通过产生HTML标签树并且确定具有该种子值的内容在树中的何处,可以识别构成内容的上下文模式的HTML标记。
识别在文档中的所识别的单个上下文模式(或者多个模式)的其它实例(212)。这包括在该文档中搜索对于所识别的上下文模式(或者模式)的匹配。HTML标签树可以用于找到具有匹配的上下文模式的内容。例如,如果上下文模式是“<b>属性:</b>值<br>”,则其它示例可以是在近处出现的“<b>属性:</b>值<br>”(例如在同一列表中的其它项目)。作为另一个示例,如果所述上下文模式是表格,则所述其它实例可以是在与包括种子值的表格相同的表格中的其它表目。在一些实施例中,所识别的上下文模式的所识别的附加实例是与该上下文模式不同的实例,用于表示彼此不同的事实和与由种子A-V对表示的事实不同的事实。
如果匹配上下文模式的所识别的其它实例的数量不是至少第二预定数量(在图2中为“N”)(214-否),则对于所选定和验证的文档的处理结束。在一些实施例中,N是2,而在其它实施例中,N是大约2的整数。如果存在要验证的任何其它文档(224-否),则可以选择另一个文档来验证和处理(206)。如果没有要验证的更多的文档(224-是),则过程结束(226)。
在一些实施例中,匹配上下文模式的N个实例不包括从其识别上下文模式的、与种子值相关联的实例。换句话说,对于文档检验其是否除了与在文档中包括的种子值相关联的内容的实例之外,还具有匹配所述上下文模式的内容的N个附加的实例。在一些其它实施例中,匹配上下文模式的N个实例包括与种子值相关联的实例。即,从其识别上下文模式的、与种子值相关联的所述一个或多个实例可以作为所述N个实例的一部分被包括。而且,在一些实施例中,匹配上下文模式的内容的附加实例必须在文档中彼此接近;所述实例在文档中彼此是连续的,或者至多在预定的距离内。
在一些实施例中,如果在210识别了多个上下文模式,则在214的决断可以是所述文档是否包括所识别的至少一个上下文模式的至少N个实例。如果没有上下文模式具有匹配在文档中的上下文模式的N个实例(214-否),则对于该文档的处理结束。如果对于所识别的上下文模式的至少一个在文件中存在至少N个匹配实例(214-是),则可以从具有至少N个匹配实例的所识别的上下文模式的每个提取可被识别为A-V对的事实,如下所述。
如果文档确实具有匹配所述单个上下文模式(或者多个模式)的内容的至少N个附加实例(214-是),则从匹配所述上下文模式的其它实例识别和提取可被识别为A-V对的事实(216)。所提取的A-V对可以是所述对象的新的A-V对或者已经与对象相关联(预先存在的A-V对)并且被存储在事实库114中的A-V对。对于预先存在的A-V对,所述A-v对不再被存储在事实库114中,而是更新在事实库114中的该A-V对的来源列表(218)。来源列表(其进一步的细节如下结合图3所述)列出了在其内容中包括由所述A-V对表示的事实的文档。新的A-V对被合并到所述对象中(220),并且被存储在事实库114中。被合并到所述对象中的每个新的A-V对也包括来源列表。
可以对于每个A-V对确定置信值(222)。在一些实施例中,所述置信值仅仅是在其内容中包括所述A-V对的文档的计数。换句话说,其是在所述A-V对的来源列表中列出的来源的数量。在一些其它的实施例中,所述置信值可以是由每个源文档的页面重要性量度加权的、包括所述A-V对的来源的计数。换句话说,所述置信值是:
更一般而言,所述置信值可以基于在来源的列表中的米源的数量和其它因素。
在已经从内容的附加实例提取并且处理A-V对后,如果存在与要验证的对象相关联的其它文档(224-否),则选择另一个文档(206)。否则(224-是),所述过程结束(226)。但是,应当理解可以在另一个时间执行所述过程以学习附加事实或者验证与对象相关联的事实。可以从被合并到对象中的A-V对(如上所述)和如上所述在所述过程的开始就已经与对象相关联的事实中提取用于所述过程的后续执行的种子事实。即,新的A-V对以及预先存在的A-V对可以被用作用于所述过程的后续执行的种子A-V对。可以在需要时或者以预定的间隔执行所述过程。而且,可以对事实库中的其它对象执行所述过程。
图3图解了按照本发明的一些实施例的、在事实库114中的对象的示例数据结构。如上所述,事实库114包括多个对象,其中每个可以包括一个或多个事实。每个对象300包括唯一标识符,诸如对象ID302。对象300包括一个或多个事实304。每个事实304包括用于那个事实的唯一标识符,诸如事实ID 310。每个事实304包括属性312和值314。例如,在表示乔治华盛顿的对象中包括的事实可以包括具有“出生日期”和“去世日期”的属性的事实,这些事实的值将分别是实际的出生日期和去世日期。事实304可以包括到另一个对象的链接316,其是对象标识符,诸如在事实库114中的另一个对象的对象ID 302。链接316允许对象具有其值是其它对象的事实。例如,对于对象“美国”,可以有具有属性“总统”的事实,其值是“乔治·W·布什”,而“乔治·W·布什”是在事实库114中的另一个对象。在一些实施例中,值字段314存储所链接的对象的名称,链接316存储所链接的对象的对象标识符。在一些其它实施例中,事实304不包括链接字段316,因为事实304的值314可以存储到另一个对象的链接。
每个事实304也可以包括一个或多个量度318。所述量度可以提供事实的质量的指示。在一些实施例中,所述量度包括置信级和重要性级。置信级指示所述事实为正确的可能性。与同一对象的其它事实相比较,所述重要性级指示所述事实与对象的关联性。重要性级可以可选地被看作事实对于由对象表示的实体或者概念的理解如何重要的测量。
每个事实304包括来源列表320,所述来源包括所述事实,并且从其提取所述事实。可以通过统一资源定位符(URL)或者万维网地址或者任何其它适当形式的标识和/或定位(诸如唯一文档标识符)来识别每个来源。
在一些实施例中,一些事实可以包括代理字段322,其识别提取所述事实的模块。例如,所述代理可以是:专门的模块,其从特定来源(例如特定网站或者同族网站的页面)或者来源类型(例如提供表格形式的事实信息的网页)提取事实;或者,从在万维网上的文档中的自由文本提取事实的模块,等等。
在一些实施例中,对象300可以具有一个或多个专门的事实,诸如名称事实306和特性事实308。名称事实306是传送由对象300表示的实体或者概念的名称的事实。例如,对于表示国家西班牙的对象,可能有表达为“西班牙”的对象名称的事实。作为一般事实304的特殊实例,名称事实306包括与任何其它事实304相同的参数;它具有属性、值、事实ID、量度、来源等。名称事实306的属性324指示所述事实是名称事实,并且所述值是实际名称。所述名称可以是字符串。对象300可以具有一个或多个名称事实,因为许多实体或者概念具有多个名称。例如,表示西班牙的对象具有名称事实,用于表达所述国家的通用名称“西班牙”和官方名称“西班牙王国”。作为另一个示例,用于表示美国专利和商标局的对象可以具有名称事实,用于表达所述机构的缩写“PTO”和“USPTO”和官方名称“美国专利和商标局”。如果一个对象确实具有多个名称事实,则其中一个名称事实可以被指定为主要名称,并且其它的名称事实可以被指定为第二名称。
特性事实308是表达关于由可能感兴趣的对象300所表示的实体或者概念的陈述的事实。例如,对于表示西班牙的对象,特性事实可以表达西班牙是在欧洲的国家。作为一般事实304的特殊实例,特性事实308也包括与其它事实304相同的参数(诸如属性、值、事实ID等)。特性事实308的属性字段326表示所述事实是特性事实,并且值字段是表达所感兴趣的陈述的文本串。例如,对于表示西班牙的对象,特性事实的值可以是文本串“是在欧洲的国家”。一些对象300可以具有一个或多个特性事实,而其它对象可以没有特性事实。
应当明白在图3中所图解的和如上所述的数据结构仅仅是示例性的。事实库114的数据结构可以采用其它形式。其它字段可以被包括在事实中,并且如上所述的一些字段可以被省略。另外,除了名称事实和特性事实之外,每个对象还可以具有另外的特殊事实,诸如表达用于将由对象表示的实体或者概念分类的类型或者类别(例如人、位置、电影、演员、组织等)的事实。在一些实施例中,可以将对象的一个或多个名称和/或属性表示为具有与关联于对象的属性值对的一般事实记录304不同的格式的特殊记录。
图4是图解按照本发明的一些实施例的事实学习系统400的框图。所述系统400通常包括一个或多个处理单元(CPU)402、一个或多个网络或者其它通信接口410、存储器412和用于互连这些部件的一个或多个通信总线414。系统400可选地可以包括用户界面404,其包括显示器406、键盘408和诸如鼠标、跟踪球或者触敏板的指针器件409。存储器412包括高速随机存取存储器,诸如DRAM、SRAM、DDR RAM或者其它随机存取固态存储器;并且,存储器412可以包括非易失性存储器,诸如一个或多个磁盘存储器、光盘存储器、快闪存储器或者其它非易失性固态存储器。存储器412可以可选地包括一个或多个远离CPU 402的一个或多个存储器。在一些实施例中,存储器412存储下面的程序、模块和数据结构或者其子集:
◆操作系统416,其包括用于处理各种基本系统服务和用于执行硬件相关的任务的规程;
◆网络通信模块(或者指令)418,其用于将事实学习系统400经由一个或多个诸如因特网、其它广域网、局域网、城域网等的通信网络接口410(有线或者无线)连接到其它计算机。
◆
◆事实存储接口(或者指令)420,其用于将事实学习系统400连接到事实存储系统436(其可以包括事实索引和事实库和/或其它适当的数据结构);
◆对象访问模块(或者指令)422,用于访问在事实存储系统436中存储的对象和相关联的事实;
◆文档识别模块(或者指令)424,用于识别与对象相关联的文档,并且识别在所述文档内的种子事实;
◆模式识别模块(或者指令)426,用于识别与在文档中的事实相关联的上下文模式;
◆模式匹配模块(或者指令)428,用于在匹配上下文模式的文档中找到内容实例;
◆事实提取模块(或者指令)430,用于从文档提取事实,向对象中合并新的事实,并且更新文档列表;以及
◆置信模块432,用于确定事实的置信值。
在一些实施例中,系统400的存储器412包括事实索引,而不是连到事实索引的接口420。系统400也包括用于存储事实的事实存储系统436。如上所述,在一些实施例中,在事实存储系统436中存储的每个事实包括从其提取相应的事实的对应的来源列表。系统400也可以包括搜索引擎434,用于搜索文档,并且/或者用于在事实存储系统中搜索事实。但是,在其它实施例中,从源文档提取事实并且将其加到事实存储系统436的“后端系统”可以是与包括用于搜索事实存储系统的搜索引擎的“前端”完全不同的系统。所述前端系统不是本文档的主题,其可以接收由后端系统建立的事实库和事实索引的复件。
应当明白,至少一些如上所述的模块可以被一起编组作为一个模块。例如,模块426和428可以被编组为一个模式模块。
每个上述的元素可以被存储在一个或多个前述的存储器中,并且对应于用于执行如上所述的功能的指令集。上述的模块或者程序(即指令集)不必被实现为独立的软件程序、规程或者模块,因此在各个实施例中,这些模块的各个子集可以被组合或者重新布置。在一些实施例中,存储器412可以存储如上所述的模块和数据结构的子集。而且,存储器412可以存储上面未说明的附加模块和数据结构。
虽然图4示出了“事实学习系统”,相对于在此所述的实施例的结构示意图,图4更被试图作为可以在一组服务器中提供的各种特征的功能说明。在实践中,并且如本领域内的技术人员所认识到的,可以独立地组合所示的项目,并且可以分离一些项目。例如,在图4中分离显示的一些项目可以被实现在单个服务器上,并且单个项目可以被一个或多个服务器实现。用于实现事实学习系统的服务器的实际数量以及如何在它们之间分配特征根据实现方式会有所不同,并且可能部分地依赖于系统在高峰使用时段以及在平均的使用时段所必须处理的数据传输量,并且还可能依赖于事实库的大小和每个服务器可以有效地处理的事实信息量。
已经参考特定实施例而描述了用于解释的上述说明。但是,上述的说明性讨论并不意在是穷举的或者将本发明限定到所公开的精确的形式。根据上述的教导,有可能进行许多修改和变化。所述实施例被选择和描述以便最佳地解释本发明的原理及其实际应用,由此使得其它本领域内的技术人员能够最佳地利用具有适合于所考虑的特定使用的各种修改的本发明和各个实施例。
Claims (19)
1.一种学习事实的方法,包括:
访问具有名称和一个或多个种子属性值对的对象;
识别与所述对象名称相关联的一组文档,在所述组中的每个文档具有所述对象的至少第一预定数量的独立种子属性值对;
对于在所识别的组中的每个文档:
在所述文档中识别与在所述文档中的相应种子属性值对相关联的上下文模式;
确认所述文档包括至少第二预定数量的与所述上下文模式匹配的附加内容实例;并且,
当所述确认成功时,从匹配所述上下文模式的相应内容实例提取属性值对,并且将所提取的属性值对合并到所述对象中。
2.如权利要求1所述的方法,还包括:对于与在所述文档中的所述上下文模式匹配的一个或多个内容实例,重复所述提取和合并操作。
3.如权利要求1所述的方法,其中,所提取和合并的属性值对与所述对象的所有其它的属性值对不同。
4.如权利要求1所述的方法,还包括:
识别在所述文档中的与所述对象的相应的属性值对匹配的属性值对;并且
将所述文档的标识符加到与所述对象的所述相应的属性值对相关联的文档的列表中。
5.如权利要求4所述的方法,还包括:对于基于在与所述属性值对相关联的文档列表中的文档的所述对象的每个属性值对,生成置信值。
6.如权利要求4所述的方法,还包括:对于与在与所述属性值对相关联的文档列表中的多个文档相对应的所述对象的每个属性值对,生成置信值。
7.一种用于学习事实的系统,包括:
一个或多个模块,具有用于下列功能的指令:
访问具有名称和一个或多个种子属性值对的对象;
识别与所述对象名称相关联的一组文档,在所述组中的每个文档具有所述对象的至少第一预定数量的独立种子属性值对;
对于在所识别的组中的每个文档:
在所述文档中识别与在所述文档中的相应种子属性值对相关联的上下文模式;以及
确认所述文档包括至少第二预定数量的与所述上下文模式匹配的附加内容实例;以及
从匹配所述上下文模式的相应内容实例提取属性值对,并且将所提取的属性值对合并到所述对象中。
8.如权利要求7所述的系统,其中所述一个或多个模块包括用于下述功能的指令:从与在所述文档中的所述上下文模式匹配的内容实例重复地提取和合并属性值对。
9.如权利要求7所述的系统,其中所提取和合并的属性值对与所述对象的所有其它的属性值对不同。
10.如权利要求7所述的系统,其中所述一个或多个模块包括用于下述功能的指令:
识别在所述文档中的与所述对象的相应属性值对匹配的属性值对;以及
将所述文档的标识符加到与所述对象的所述相应属性值对相关联的文档的列表中。
11.如权利要求10所述的系统,还包括用于下述功能的指令:对于基于在与所述属性值对相关联的文档列表中的文档的所述对象的每个属性值对,生成置信值。
12.如权利要求10所述的系统,还包括用于下述功能的指令:对于与在与所述属性值对相关联的文档列表中的多个文档相对应的所述对象的每个属性值对,生成置信值。
13.一种计算机程序产品,用于与计算机系统结合使用,所述计算机程序产品包括计算机可读存储介质和在其中嵌入的计算机程序机制,所述计算机程序机制包括用于下述功能的指令:
访问具有名称和一个或多个种子属性值对的对象;
识别与所述对象名称相关联的一组文档,在所述组中的每个文档具有所述对象的至少第一预定数量的独立种子属性值对;
对于在所识别的组中的每个文档:
在所述文档中识别与在所述文档中的相应种子属性值对相关联的上下文模式;
确认所述文档包括至少第二预定数量的与所述上下文模式匹配的附加内容实例;以及
当所述确认成功时,从匹配所述上下文模式的相应内容实例提取属性值对,并且将所提取的属性值对合并到所述对象中。
14.如权利要求13所述的计算机程序产品,还包括:对于与在所述文档中的所述上下文模式匹配的一个或多个内容实例,重复所述提取和合并操作。
15.如权利要求13所述的计算机程序产品,其中所提取和合并的属性值对与所述对象的所有其它的属性值对不同。
16.如权利要求13所述的计算机程序产品,还包括用于下述功能的指令:
识别在所述文档中的与所述对象的相应属性值对匹配的属性值对;以及
将所述文档的标识符加到与所述对象的所述相应属性值对相关联的文档的列表中。
17.如权利要求16所述的计算机程序产品,还包括用于下述功能的指令:对于基于在与所述属性值对相关联的所述文档列表中的文档的所述对象的每个属性值对,生成置信值。
18.如权利要求16所述的计算机程序产品,还包括用于下述功能的指令:对于与在与所述属性值对相关联的所述文档列表中的多个文档相对应的所述对象的每个属性值对,生成置信值。
19.一种用于学习事实的系统,包括:
用于访问具有名称和一个或多个种子属性值对的对象的装置;
用于识别与所述对象名称相关联的一组文档的装置,其中在所述组中的每个文档具有所述对象的至少第一预定数量的独立种子属性值对;
用于下述功能的装置,对于在所识别的组中的每个文档:
在所述文档中识别与在所述文档中的相应种子属性值对相关联的上下文模式;
确认所述文档包括至少第二预定数量的与所述上下文模式匹配的附加内容实例;并且,
当所述确认成功时,从匹配所述上下文模式的相应内容实例提取属性值对,并且将所提取的属性值对合并到所述对象中。
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US11/142,853 | 2005-05-31 | ||
US11/142,853 US7769579B2 (en) | 2005-05-31 | 2005-05-31 | Learning facts from semi-structured text |
PCT/US2006/019807 WO2006132793A2 (en) | 2005-05-31 | 2006-05-18 | Learning facts from semi-structured text |
Publications (2)
Publication Number | Publication Date |
---|---|
CN101253498A true CN101253498A (zh) | 2008-08-27 |
CN101253498B CN101253498B (zh) | 2010-12-08 |
Family
ID=37309560
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN2006800280576A Active CN101253498B (zh) | 2005-05-31 | 2006-05-18 | 从半结构化的文本学习事实 |
Country Status (5)
Country | Link |
---|---|
US (4) | US7769579B2 (zh) |
EP (1) | EP1891557A2 (zh) |
CN (1) | CN101253498B (zh) |
CA (1) | CA2610208C (zh) |
WO (1) | WO2006132793A2 (zh) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102200983A (zh) * | 2010-03-25 | 2011-09-28 | 日电(中国)有限公司 | 属性提取装置和方法 |
CN102662986A (zh) * | 2012-01-13 | 2012-09-12 | 中国科学院计算技术研究所 | 微博消息检索系统与方法 |
CN105488105A (zh) * | 2015-11-19 | 2016-04-13 | 百度在线网络技术(北京)有限公司 | 信息提取模板的建立方法、知识数据的处理方法和装置 |
Families Citing this family (106)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8244689B2 (en) | 2006-02-17 | 2012-08-14 | Google Inc. | Attribute entropy as a signal in object normalization |
US7769579B2 (en) | 2005-05-31 | 2010-08-03 | Google Inc. | Learning facts from semi-structured text |
US9208229B2 (en) | 2005-03-31 | 2015-12-08 | Google Inc. | Anchor text summarization for corroboration |
US7587387B2 (en) | 2005-03-31 | 2009-09-08 | Google Inc. | User interface for facts query engine with snippets from information sources that include query terms and answer terms |
US8682913B1 (en) | 2005-03-31 | 2014-03-25 | Google Inc. | Corroborating facts extracted from multiple sources |
US7567976B1 (en) * | 2005-05-31 | 2009-07-28 | Google Inc. | Merging objects in a facts database |
US8996470B1 (en) | 2005-05-31 | 2015-03-31 | Google Inc. | System for ensuring the internal consistency of a fact repository |
US7831545B1 (en) | 2005-05-31 | 2010-11-09 | Google Inc. | Identifying the unifying subject of a set of facts |
US7512620B2 (en) * | 2005-08-19 | 2009-03-31 | Google Inc. | Data structure for incremental search |
US8260785B2 (en) | 2006-02-17 | 2012-09-04 | Google Inc. | Automatic object reference identification and linking in a browseable fact repository |
US7991797B2 (en) | 2006-02-17 | 2011-08-02 | Google Inc. | ID persistence through normalization |
US8700568B2 (en) | 2006-02-17 | 2014-04-15 | Google Inc. | Entity normalization via name normalization |
US9495358B2 (en) | 2006-10-10 | 2016-11-15 | Abbyy Infopoisk Llc | Cross-language text clustering |
US8122026B1 (en) | 2006-10-20 | 2012-02-21 | Google Inc. | Finding and disambiguating references to entities on web pages |
US8285697B1 (en) | 2007-01-23 | 2012-10-09 | Google Inc. | Feedback enhanced attribute extraction |
US8347202B1 (en) | 2007-03-14 | 2013-01-01 | Google Inc. | Determining geographic locations for place names in a fact repository |
US7739212B1 (en) * | 2007-03-28 | 2010-06-15 | Google Inc. | System and method for updating facts in a fact repository |
US8239350B1 (en) | 2007-05-08 | 2012-08-07 | Google Inc. | Date ambiguity resolution |
US7761473B2 (en) * | 2007-05-18 | 2010-07-20 | Microsoft Corporation | Typed relationships between items |
US7966291B1 (en) | 2007-06-26 | 2011-06-21 | Google Inc. | Fact-based object merging |
US7970766B1 (en) | 2007-07-23 | 2011-06-28 | Google Inc. | Entity type assignment |
US8738643B1 (en) | 2007-08-02 | 2014-05-27 | Google Inc. | Learning synonymous object names from anchor texts |
US7984032B2 (en) * | 2007-08-31 | 2011-07-19 | Microsoft Corporation | Iterators for applying term occurrence-level constraints in natural language searching |
US8812435B1 (en) * | 2007-11-16 | 2014-08-19 | Google Inc. | Learning objects and facts from documents |
US8346791B1 (en) | 2008-05-16 | 2013-01-01 | Google Inc. | Search augmentation |
US20090307183A1 (en) * | 2008-06-10 | 2009-12-10 | Eric Arno Vigen | System and Method for Transmission of Communications by Unique Definition Identifiers |
US8645391B1 (en) | 2008-07-03 | 2014-02-04 | Google Inc. | Attribute-value extraction from structured documents |
US20140142920A1 (en) | 2008-08-13 | 2014-05-22 | International Business Machines Corporation | Method and apparatus for Utilizing Structural Information in Semi-Structured Documents to Generate Candidates for Question Answering Systems |
US8412749B2 (en) * | 2009-01-16 | 2013-04-02 | Google Inc. | Populating a structured presentation with new values |
EP2416257A4 (en) * | 2009-03-31 | 2015-04-22 | Fujitsu Ltd | COMPUTER-ASSISTED NAME IDENTIFICATION EQUIPMENT, NAME IDENTIFICATION METHOD, AND NAME IDENTIFICATION PROGRAM |
US8434134B2 (en) | 2010-05-26 | 2013-04-30 | Google Inc. | Providing an electronic document collection |
US8775400B2 (en) * | 2010-06-30 | 2014-07-08 | Microsoft Corporation | Extracting facts from social network messages |
US8346792B1 (en) * | 2010-11-09 | 2013-01-01 | Google Inc. | Query generation using structural similarity between documents |
US9460207B2 (en) * | 2010-12-08 | 2016-10-04 | Microsoft Technology Licensing, Llc | Automated database generation for answering fact lookup queries |
US8655866B1 (en) | 2011-02-10 | 2014-02-18 | Google Inc. | Returning factual answers in response to queries |
US9632994B2 (en) | 2011-03-11 | 2017-04-25 | Microsoft Technology Licensing, Llc | Graphical user interface that supports document annotation |
US9626348B2 (en) | 2011-03-11 | 2017-04-18 | Microsoft Technology Licensing, Llc | Aggregating document annotations |
US9075873B2 (en) | 2011-03-11 | 2015-07-07 | Microsoft Technology Licensing, Llc | Generation of context-informative co-citation graphs |
US9582591B2 (en) | 2011-03-11 | 2017-02-28 | Microsoft Technology Licensing, Llc | Generating visual summaries of research documents |
US8719692B2 (en) | 2011-03-11 | 2014-05-06 | Microsoft Corporation | Validation, rejection, and modification of automatically generated document annotations |
US8768782B1 (en) | 2011-06-10 | 2014-07-01 | Linkedin Corporation | Optimized cloud computing fact checking |
US9087048B2 (en) | 2011-06-10 | 2015-07-21 | Linkedin Corporation | Method of and system for validating a fact checking system |
US9116996B1 (en) | 2011-07-25 | 2015-08-25 | Google Inc. | Reverse question answering |
US8782042B1 (en) * | 2011-10-14 | 2014-07-15 | Firstrain, Inc. | Method and system for identifying entities |
US8856640B1 (en) | 2012-01-20 | 2014-10-07 | Google Inc. | Method and apparatus for applying revision specific electronic signatures to an electronically stored document |
US20130246435A1 (en) * | 2012-03-14 | 2013-09-19 | Microsoft Corporation | Framework for document knowledge extraction |
US8819047B2 (en) | 2012-04-04 | 2014-08-26 | Microsoft Corporation | Fact verification engine |
US9659059B2 (en) * | 2012-07-20 | 2017-05-23 | Salesforce.Com, Inc. | Matching large sets of words |
US9619458B2 (en) | 2012-07-20 | 2017-04-11 | Salesforce.Com, Inc. | System and method for phrase matching with arbitrary text |
US20140052647A1 (en) * | 2012-08-17 | 2014-02-20 | Truth Seal Corporation | System and Method for Promoting Truth in Public Discourse |
EP2916238A4 (en) * | 2012-10-19 | 2016-06-15 | Rakuten Inc | CORPUS CREATIVE DEVICE, CORPUSED CREATION PROCESS AND CORPUSED CREATING PROGRAM |
US9870554B1 (en) | 2012-10-23 | 2018-01-16 | Google Inc. | Managing documents based on a user's calendar |
US9529916B1 (en) | 2012-10-30 | 2016-12-27 | Google Inc. | Managing documents based on access context |
US11308037B2 (en) | 2012-10-30 | 2022-04-19 | Google Llc | Automatic collaboration |
US9483159B2 (en) | 2012-12-12 | 2016-11-01 | Linkedin Corporation | Fact checking graphical user interface including fact checking icons |
US9495341B1 (en) | 2012-12-18 | 2016-11-15 | Google Inc. | Fact correction and completion during document drafting |
US9384285B1 (en) | 2012-12-18 | 2016-07-05 | Google Inc. | Methods for identifying related documents |
US9235626B2 (en) | 2013-03-13 | 2016-01-12 | Google Inc. | Automatic generation of snippets based on context and user interest |
US10713261B2 (en) | 2013-03-13 | 2020-07-14 | Google Llc | Generating insightful connections between graph entities |
US9224103B1 (en) | 2013-03-13 | 2015-12-29 | Google Inc. | Automatic annotation for training and evaluation of semantic analysis engines |
US10810193B1 (en) | 2013-03-13 | 2020-10-20 | Google Llc | Querying a data graph using natural language queries |
US9235653B2 (en) | 2013-06-26 | 2016-01-12 | Google Inc. | Discovering entity actions for an entity graph |
US9342622B2 (en) | 2013-06-27 | 2016-05-17 | Google Inc. | Two-phase construction of data graphs from disparate inputs |
US9514113B1 (en) | 2013-07-29 | 2016-12-06 | Google Inc. | Methods for automatic footnote generation |
US9842113B1 (en) | 2013-08-27 | 2017-12-12 | Google Inc. | Context-based file selection |
US10169424B2 (en) | 2013-09-27 | 2019-01-01 | Lucas J. Myslinski | Apparatus, systems and methods for scoring and distributing the reliability of online information |
US20150095320A1 (en) | 2013-09-27 | 2015-04-02 | Trooclick France | Apparatus, systems and methods for scoring the reliability of online information |
US9785696B1 (en) | 2013-10-04 | 2017-10-10 | Google Inc. | Automatic discovery of new entities using graph reconciliation |
WO2015051480A1 (en) | 2013-10-09 | 2015-04-16 | Google Inc. | Automatic definition of entity collections |
US9798829B1 (en) | 2013-10-22 | 2017-10-24 | Google Inc. | Data graph interface |
US10002117B1 (en) | 2013-10-24 | 2018-06-19 | Google Llc | Translating annotation tags into suggested markup |
US9529791B1 (en) | 2013-12-12 | 2016-12-27 | Google Inc. | Template and content aware document and template editing |
US9659056B1 (en) | 2013-12-30 | 2017-05-23 | Google Inc. | Providing an explanation of a missing fact estimate |
RU2586577C2 (ru) | 2014-01-15 | 2016-06-10 | Общество с ограниченной ответственностью "Аби ИнфоПоиск" | Фильтрация дуг в синтаксическом графе |
US9972055B2 (en) | 2014-02-28 | 2018-05-15 | Lucas J. Myslinski | Fact checking method and system utilizing social networking information |
US9643722B1 (en) | 2014-02-28 | 2017-05-09 | Lucas J. Myslinski | Drone device security system |
US8990234B1 (en) * | 2014-02-28 | 2015-03-24 | Lucas J. Myslinski | Efficient fact checking method and system |
US9703763B1 (en) | 2014-08-14 | 2017-07-11 | Google Inc. | Automatic document citations by utilizing copied content for candidate sources |
US9189514B1 (en) | 2014-09-04 | 2015-11-17 | Lucas J. Myslinski | Optimized fact checking method and system |
US20160078364A1 (en) * | 2014-09-17 | 2016-03-17 | Microsoft Corporation | Computer-Implemented Identification of Related Items |
US9672251B1 (en) * | 2014-09-29 | 2017-06-06 | Google Inc. | Extracting facts from documents |
US9626358B2 (en) | 2014-11-26 | 2017-04-18 | Abbyy Infopoisk Llc | Creating ontologies by analyzing natural language texts |
US20160162576A1 (en) * | 2014-12-05 | 2016-06-09 | Lightning Source Inc. | Automated content classification/filtering |
US9594554B2 (en) * | 2015-07-30 | 2017-03-14 | International Buisness Machines Corporation | Extraction and transformation of executable online documentation |
US10318564B2 (en) | 2015-09-28 | 2019-06-11 | Microsoft Technology Licensing, Llc | Domain-specific unstructured text retrieval |
US10354188B2 (en) | 2016-08-02 | 2019-07-16 | Microsoft Technology Licensing, Llc | Extracting facts from unstructured information |
US10268965B2 (en) | 2015-10-27 | 2019-04-23 | Yardi Systems, Inc. | Dictionary enhancement technique for business name categorization |
US10275841B2 (en) | 2015-10-27 | 2019-04-30 | Yardi Systems, Inc. | Apparatus and method for efficient business name categorization |
US10275708B2 (en) | 2015-10-27 | 2019-04-30 | Yardi Systems, Inc. | Criteria enhancement technique for business name categorization |
US10274983B2 (en) | 2015-10-27 | 2019-04-30 | Yardi Systems, Inc. | Extended business name categorization apparatus and method |
US11216718B2 (en) | 2015-10-27 | 2022-01-04 | Yardi Systems, Inc. | Energy management system |
US10346448B2 (en) | 2016-07-13 | 2019-07-09 | Google Llc | System and method for classifying an alphanumeric candidate identified in an email message |
US11568274B2 (en) * | 2016-08-05 | 2023-01-31 | Google Llc | Surfacing unique facts for entities |
RU2640718C1 (ru) | 2016-12-22 | 2018-01-11 | Общество с ограниченной ответственностью "Аби Продакшн" | Верификация атрибутов информационных объектов |
US10255271B2 (en) * | 2017-02-06 | 2019-04-09 | International Business Machines Corporation | Disambiguation of the meaning of terms based on context pattern detection |
US10572601B2 (en) | 2017-07-28 | 2020-02-25 | International Business Machines Corporation | Unsupervised template extraction |
WO2019090318A1 (en) | 2017-11-06 | 2019-05-09 | Cornell University | Verifying text summaries of relational data sets |
US11144530B2 (en) * | 2017-12-21 | 2021-10-12 | International Business Machines Corporation | Regulating migration and recall actions for high latency media (HLM) on objects or group of objects through metadata locking attributes |
US11194832B2 (en) * | 2018-09-13 | 2021-12-07 | Sap Se | Normalization of unstructured catalog data |
JP6998282B2 (ja) * | 2018-09-19 | 2022-01-18 | ヤフー株式会社 | 情報処理装置、情報処理方法、およびプログラム |
US11308141B2 (en) * | 2018-12-26 | 2022-04-19 | Yahoo Assets Llc | Template generation using directed acyclic word graphs |
US11170064B2 (en) | 2019-03-05 | 2021-11-09 | Corinne David | Method and system to filter out unwanted content from incoming social media data |
US11829722B2 (en) * | 2019-05-31 | 2023-11-28 | Nec Corporation | Parameter learning apparatus, parameter learning method, and computer readable recording medium |
US11630849B2 (en) * | 2020-02-21 | 2023-04-18 | International Business Machines Corporation | Optimizing insight generation in heterogeneous datasets |
JP7197531B2 (ja) * | 2020-03-19 | 2022-12-27 | ヤフー株式会社 | 情報処理装置、情報処理システム、情報処理方法、およびプログラム |
US11443101B2 (en) | 2020-11-03 | 2022-09-13 | International Business Machine Corporation | Flexible pseudo-parsing of dense semi-structured text |
Family Cites Families (299)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5010478A (en) | 1986-04-11 | 1991-04-23 | Deran Roger L | Entity-attribute value database system with inverse attribute for selectively relating two different entities |
US5133075A (en) | 1988-12-19 | 1992-07-21 | Hewlett-Packard Company | Method of monitoring changes in attribute values of object in an object-oriented database |
US5440730A (en) | 1990-08-09 | 1995-08-08 | Bell Communications Research, Inc. | Time index access structure for temporal databases having concurrent multiple versions |
CA2048306A1 (en) | 1990-10-02 | 1992-04-03 | Steven P. Miller | Distributed configuration profile for computing system |
US5347653A (en) | 1991-06-28 | 1994-09-13 | Digital Equipment Corporation | System for reconstructing prior versions of indexes using records indicating changes between successive versions of the indexes |
US5694590A (en) | 1991-09-27 | 1997-12-02 | The Mitre Corporation | Apparatus and method for the detection of security violations in multilevel secure databases |
JPH05174020A (ja) | 1991-12-26 | 1993-07-13 | Okinawa Nippon Denki Software Kk | 日本語処理装置 |
US5574898A (en) | 1993-01-08 | 1996-11-12 | Atria Software, Inc. | Dynamic software version auditor which monitors a process to provide a list of objects that are accessed |
US7082426B2 (en) | 1993-06-18 | 2006-07-25 | Cnet Networks, Inc. | Content aggregation method and apparatus for an on-line product catalog |
US5519608A (en) * | 1993-06-24 | 1996-05-21 | Xerox Corporation | Method for extracting from a text corpus answers to questions stated in natural language by using linguistic analysis and hypothesis generation |
US5546507A (en) | 1993-08-20 | 1996-08-13 | Unisys Corporation | Apparatus and method for generating a knowledge base |
US5560005A (en) | 1994-02-25 | 1996-09-24 | Actamed Corp. | Methods and systems for object-based relational distributed databases |
US5680622A (en) | 1994-06-30 | 1997-10-21 | Borland International, Inc. | System and methods for quickly detecting shareability of symbol and type information in header files |
US5675785A (en) | 1994-10-04 | 1997-10-07 | Hewlett-Packard Company | Data warehouse which is accessed by a user using a schema of virtual tables |
JP2809341B2 (ja) * | 1994-11-18 | 1998-10-08 | 松下電器産業株式会社 | 情報要約方法、情報要約装置、重み付け方法、および文字放送受信装置。 |
US5608903A (en) | 1994-12-15 | 1997-03-04 | Novell, Inc. | Method and apparatus for moving subtrees in a distributed network directory |
US5717911A (en) * | 1995-01-23 | 1998-02-10 | Tandem Computers, Inc. | Relational database system and method with high availability compliation of SQL programs |
US5793966A (en) | 1995-12-01 | 1998-08-11 | Vermeer Technologies, Inc. | Computer system and computer-implemented process for creation and maintenance of online services |
US5724571A (en) * | 1995-07-07 | 1998-03-03 | Sun Microsystems, Inc. | Method and apparatus for generating query responses in a computer-based document retrieval system |
US5717951A (en) | 1995-08-07 | 1998-02-10 | Yabumoto; Kan W. | Method for storing and retrieving information on a magnetic storage medium via data blocks of variable sizes |
US6006221A (en) | 1995-08-16 | 1999-12-21 | Syracuse University | Multilingual document retrieval system and method using semantic vector matching |
US5838979A (en) | 1995-10-31 | 1998-11-17 | Peritus Software Services, Inc. | Process and tool for scalable automated data field replacement |
US5701470A (en) | 1995-12-08 | 1997-12-23 | Sun Microsystems, Inc. | System and method for space efficient object locking using a data subarray and pointers |
US5815415A (en) | 1996-01-19 | 1998-09-29 | Bentley Systems, Incorporated | Computer system for portable persistent modeling |
US5802299A (en) | 1996-02-13 | 1998-09-01 | Microtouch Systems, Inc. | Interactive system for authoring hypertext document collections |
US5778378A (en) | 1996-04-30 | 1998-07-07 | International Business Machines Corporation | Object oriented information retrieval framework mechanism |
US5920859A (en) | 1997-02-05 | 1999-07-06 | Idd Enterprises, L.P. | Hypertext document retrieval system and method |
US5819210A (en) | 1996-06-21 | 1998-10-06 | Xerox Corporation | Method of lazy contexted copying during unification |
US6052693A (en) | 1996-07-02 | 2000-04-18 | Harlequin Group Plc | System for assembling large databases through information extracted from text sources |
US5987460A (en) | 1996-07-05 | 1999-11-16 | Hitachi, Ltd. | Document retrieval-assisting method and system for the same and document retrieval service using the same with document frequency and term frequency |
US5819265A (en) | 1996-07-12 | 1998-10-06 | International Business Machines Corporation | Processing names in a text |
US5778373A (en) | 1996-07-15 | 1998-07-07 | At&T Corp | Integration of an information server database schema by generating a translation map from exemplary files |
US5787413A (en) | 1996-07-29 | 1998-07-28 | International Business Machines Corporation | C++ classes for a digital library |
US6820093B2 (en) | 1996-07-30 | 2004-11-16 | Hyperphrase Technologies, Llc | Method for verifying record code prior to an action based on the code |
US5826258A (en) | 1996-10-02 | 1998-10-20 | Junglee Corporation | Method and apparatus for structuring the querying and interpretation of semistructured information |
US6285999B1 (en) | 1997-01-10 | 2001-09-04 | The Board Of Trustees Of The Leland Stanford Junior University | Method for node ranking in a linked database |
US7269587B1 (en) | 1997-01-10 | 2007-09-11 | The Board Of Trustees Of The Leland Stanford Junior University | Scoring documents in a linked database |
AUPO525497A0 (en) * | 1997-02-21 | 1997-03-20 | Mills, Dudley John | Network-based classified information systems |
US6134555A (en) | 1997-03-10 | 2000-10-17 | International Business Machines Corporation | Dimension reduction using association rules for data mining application |
US5822743A (en) | 1997-04-08 | 1998-10-13 | 1215627 Ontario Inc. | Knowledge-based information retrieval system |
US5882743A (en) | 1997-04-21 | 1999-03-16 | Kimberly-Clark Worldwide, Inc. | Absorbent folded hand towel |
US6038560A (en) * | 1997-05-21 | 2000-03-14 | Oracle Corporation | Concept knowledge base search and retrieval system |
US5974254A (en) | 1997-06-06 | 1999-10-26 | National Instruments Corporation | Method for detecting differences between graphical programs |
US5893093A (en) * | 1997-07-02 | 1999-04-06 | The Sabre Group, Inc. | Information search and retrieval with geographical coordinates |
AU735024B2 (en) | 1997-07-25 | 2001-06-28 | British Telecommunications Public Limited Company | Scheduler for a software system |
DE69803575T2 (de) | 1997-07-25 | 2002-08-29 | British Telecomm | Visualisierung in einem modularen softwaresystem |
AU753202B2 (en) | 1997-07-25 | 2002-10-10 | British Telecommunications Public Limited Company | Software system generation |
US5909689A (en) | 1997-09-18 | 1999-06-01 | Sony Corporation | Automatic update of file versions for files shared by several computers which record in respective file directories temporal information for indicating when the files have been created |
US6073130A (en) * | 1997-09-23 | 2000-06-06 | At&T Corp. | Method for improving the results of a search in a structured database |
US6442540B2 (en) * | 1997-09-29 | 2002-08-27 | Kabushiki Kaisha Toshiba | Information retrieval apparatus and information retrieval method |
US6996572B1 (en) | 1997-10-08 | 2006-02-07 | International Business Machines Corporation | Method and system for filtering of information entities |
US6018741A (en) * | 1997-10-22 | 2000-01-25 | International Business Machines Corporation | Method and system for managing objects in a dynamic inheritance tree |
US6112210A (en) | 1997-10-31 | 2000-08-29 | Oracle Corporation | Apparatus and method for null representation in database object storage |
WO1999027556A2 (en) * | 1997-11-20 | 1999-06-03 | Xacct Technologies, Inc. | Network accounting and billing system and method |
US5943670A (en) | 1997-11-21 | 1999-08-24 | International Business Machines Corporation | System and method for categorizing objects in combined categories |
US6349275B1 (en) | 1997-11-24 | 2002-02-19 | International Business Machines Corporation | Multiple concurrent language support system for electronic catalogue using a concept based knowledge representation |
US6212526B1 (en) | 1997-12-02 | 2001-04-03 | Microsoft Corporation | Method for apparatus for efficient mining of classification models from databases |
US6094650A (en) | 1997-12-15 | 2000-07-25 | Manning & Napier Information Services | Database analysis using a probabilistic ontology |
FI106089B (fi) | 1997-12-23 | 2000-11-15 | Sonera Oyj | Liikkuvan päätelaitteen seuranta matkaviestinjärjestelmässä |
JPH11265400A (ja) | 1998-03-13 | 1999-09-28 | Omron Corp | 情報処理装置および方法、ネットワークシステム、並びに記録媒体 |
US6044366A (en) | 1998-03-16 | 2000-03-28 | Microsoft Corporation | Use of the UNPIVOT relational operator in the efficient gathering of sufficient statistics for data mining |
US6078918A (en) | 1998-04-02 | 2000-06-20 | Trivada Corporation | Online predictive memory |
US6112203A (en) | 1998-04-09 | 2000-08-29 | Altavista Company | Method for ranking documents in a hyperlinked environment using connectivity and selective content analysis |
US6567846B1 (en) * | 1998-05-15 | 2003-05-20 | E.Piphany, Inc. | Extensible user interface for a distributed messaging framework in a computer network |
US6122647A (en) | 1998-05-19 | 2000-09-19 | Perspecta, Inc. | Dynamic generation of contextual links in hypertext documents |
US6742003B2 (en) | 2001-04-30 | 2004-05-25 | Microsoft Corporation | Apparatus and accompanying methods for visualizing clusters of data and hierarchical cluster classifications |
US6327574B1 (en) | 1998-07-07 | 2001-12-04 | Encirq Corporation | Hierarchical models of consumer attributes for targeting content in a privacy-preserving manner |
US6240546B1 (en) | 1998-07-24 | 2001-05-29 | International Business Machines Corporation | Identifying date fields for runtime year 2000 system solution process, method and article of manufacture |
US7409381B1 (en) * | 1998-07-30 | 2008-08-05 | British Telecommunications Public Limited Company | Index to a semi-structured database |
US6665837B1 (en) | 1998-08-10 | 2003-12-16 | Overture Services, Inc. | Method for identifying related pages in a hyperlinked database |
US6694482B1 (en) | 1998-09-11 | 2004-02-17 | Sbc Technology Resources, Inc. | System and methods for an architectural framework for design of an adaptive, personalized, interactive content delivery system |
US6470330B1 (en) | 1998-11-05 | 2002-10-22 | Sybase, Inc. | Database system with methods for estimation and usage of index page cluster ratio (IPCR) and data page cluster ratio (DPCR) |
FR2787957B1 (fr) * | 1998-12-28 | 2001-10-05 | Inst Nat Rech Inf Automat | Procede de traitement d'une requete |
US6572661B1 (en) | 1999-01-11 | 2003-06-03 | Cisco Technology, Inc. | System and method for automated annotation of files |
US6377943B1 (en) * | 1999-01-20 | 2002-04-23 | Oracle Corp. | Initial ordering of tables for database queries |
US7003719B1 (en) | 1999-01-25 | 2006-02-21 | West Publishing Company, Dba West Group | System, method, and software for inserting hyperlinks into documents |
US6565610B1 (en) | 1999-02-11 | 2003-05-20 | Navigation Technologies Corporation | Method and system for text placement when forming maps |
US6574635B2 (en) | 1999-03-03 | 2003-06-03 | Siebel Systems, Inc. | Application instantiation based upon attributes and values stored in a meta data repository, including tiering of application layers objects and components |
US6584464B1 (en) | 1999-03-19 | 2003-06-24 | Ask Jeeves, Inc. | Grammar template query system |
US6397228B1 (en) * | 1999-03-31 | 2002-05-28 | Verizon Laboratories Inc. | Data enhancement techniques |
US6763496B1 (en) | 1999-03-31 | 2004-07-13 | Microsoft Corporation | Method for promoting contextual information to display pages containing hyperlinks |
US6263328B1 (en) | 1999-04-09 | 2001-07-17 | International Business Machines Corporation | Object oriented query model and process for complex heterogeneous database queries |
US20030195872A1 (en) * | 1999-04-12 | 2003-10-16 | Paul Senn | Web-based information content analyzer and information dimension dictionary |
US6721713B1 (en) | 1999-05-27 | 2004-04-13 | Andersen Consulting Llp | Business alliance identification in a web architecture framework |
US6606625B1 (en) | 1999-06-03 | 2003-08-12 | University Of Southern California | Wrapper induction by hierarchical data analysis |
US6711585B1 (en) | 1999-06-15 | 2004-03-23 | Kanisa Inc. | System and method for implementing a knowledge management system |
US6438543B1 (en) | 1999-06-17 | 2002-08-20 | International Business Machines Corporation | System and method for cross-document coreference |
US6473898B1 (en) | 1999-07-06 | 2002-10-29 | Pcorder.Com, Inc. | Method for compiling and selecting data attributes |
US6873982B1 (en) * | 1999-07-16 | 2005-03-29 | International Business Machines Corporation | Ordering of database search results based on user feedback |
EP1072987A1 (en) * | 1999-07-29 | 2001-01-31 | International Business Machines Corporation | Geographic web browser and iconic hyperlink cartography |
US6341306B1 (en) * | 1999-08-13 | 2002-01-22 | Atomica Corporation | Web-based information retrieval responsive to displayed word identified by a text-grabbing algorithm |
CA2281331A1 (en) | 1999-09-03 | 2001-03-03 | Cognos Incorporated | Database management system |
US6845354B1 (en) * | 1999-09-09 | 2005-01-18 | Institute For Information Industry | Information retrieval system with a neuro-fuzzy structure |
US6754873B1 (en) | 1999-09-20 | 2004-06-22 | Google Inc. | Techniques for finding related hyperlinked documents using link-based analysis |
GB2371901B (en) | 1999-09-21 | 2004-06-23 | Andrew E Borthwick | A probabilistic record linkage model derived from training data |
AU2702701A (en) | 1999-10-15 | 2001-04-23 | Milind Kotwal | Method of categorization and indexing of information |
US6665666B1 (en) | 1999-10-26 | 2003-12-16 | International Business Machines Corporation | System, method and program product for answering questions using a search engine |
US6850896B1 (en) | 1999-10-28 | 2005-02-01 | Market-Touch Corporation | Method and system for managing and providing sales data using world wide web |
JP3888812B2 (ja) * | 1999-11-01 | 2007-03-07 | 富士通株式会社 | 事実データ統合方法および装置 |
US6804667B1 (en) | 1999-11-30 | 2004-10-12 | Ncr Corporation | Filter for checking for duplicate entries in database |
US6963867B2 (en) | 1999-12-08 | 2005-11-08 | A9.Com, Inc. | Search query processing to provide category-ranked presentation of search results |
US7305380B1 (en) | 1999-12-15 | 2007-12-04 | Google Inc. | Systems and methods for performing in-context searching |
US6865582B2 (en) | 2000-01-03 | 2005-03-08 | Bechtel Bwxt Idaho, Llc | Systems and methods for knowledge discovery in spatial data |
US6606659B1 (en) | 2000-01-28 | 2003-08-12 | Websense, Inc. | System and method for controlling access to internet sites |
US6665659B1 (en) | 2000-02-01 | 2003-12-16 | James D. Logan | Methods and apparatus for distributing and using metadata via the internet |
US6567936B1 (en) | 2000-02-08 | 2003-05-20 | Microsoft Corporation | Data clustering using error-tolerant frequent item sets |
AU2001241564A1 (en) | 2000-02-17 | 2001-08-27 | E-Numerate Solutions, Inc. | Rdl search engine |
US6584646B2 (en) | 2000-02-29 | 2003-07-01 | Katoh Electrical Machinery Co., Ltd. | Tilt hinge for office automation equipment |
US6901403B1 (en) | 2000-03-02 | 2005-05-31 | Quovadx, Inc. | XML presentation of general-purpose data sources |
US6311194B1 (en) | 2000-03-15 | 2001-10-30 | Taalee, Inc. | System and method for creating a semantic web and its applications in browsing, searching, profiling, personalization and advertising |
US6738767B1 (en) * | 2000-03-20 | 2004-05-18 | International Business Machines Corporation | System and method for discovering schematic structure in hypertext documents |
US6502102B1 (en) | 2000-03-27 | 2002-12-31 | Accenture Llp | System, method and article of manufacture for a table-driven automated scripting architecture |
US6643641B1 (en) | 2000-04-27 | 2003-11-04 | Russell Snyder | Web search engine with graphic snapshots |
EP1156430A2 (en) | 2000-05-17 | 2001-11-21 | Matsushita Electric Industrial Co., Ltd. | Information retrieval system |
US6957213B1 (en) | 2000-05-17 | 2005-10-18 | Inquira, Inc. | Method of utilizing implicit references to answer a query |
US7062483B2 (en) | 2000-05-18 | 2006-06-13 | Endeca Technologies, Inc. | Hierarchical data-driven search and navigation system and method for information retrieval |
US7325201B2 (en) * | 2000-05-18 | 2008-01-29 | Endeca Technologies, Inc. | System and method for manipulating content in a hierarchical data-driven search and navigation system |
WO2001090921A2 (en) | 2000-05-25 | 2001-11-29 | Kanisa, Inc. | System and method for automatically classifying text |
US6487495B1 (en) | 2000-06-02 | 2002-11-26 | Navigation Technologies Corporation | Navigation applications using related location-referenced keywords |
US6963876B2 (en) | 2000-06-05 | 2005-11-08 | International Business Machines Corporation | System and method for searching extended regular expressions |
US6745189B2 (en) | 2000-06-05 | 2004-06-01 | International Business Machines Corporation | System and method for enabling multi-indexing of objects |
US20020042707A1 (en) | 2000-06-19 | 2002-04-11 | Gang Zhao | Grammar-packaged parsing |
US7162499B2 (en) * | 2000-06-21 | 2007-01-09 | Microsoft Corporation | Linked value replication |
GB0015233D0 (en) * | 2000-06-21 | 2000-08-16 | Canon Kk | Indexing method and apparatus |
MXPA03000110A (es) * | 2000-06-22 | 2006-06-08 | Mayer Yaron | Sistema y metodo de investigacion para buscar y contactar citas en mensajeros instantaneos en la red y/o en otros metodos capaces de encontrar y crear un contacto inmediato. |
US7003506B1 (en) * | 2000-06-23 | 2006-02-21 | Microsoft Corporation | Method and system for creating an embedded search link document |
US6578032B1 (en) | 2000-06-28 | 2003-06-10 | Microsoft Corporation | Method and system for performing phrase/word clustering and cluster merging |
US7080085B1 (en) | 2000-07-12 | 2006-07-18 | International Business Machines Corporation | System and method for ensuring referential integrity for heterogeneously scoped references in an information management system |
US6728728B2 (en) | 2000-07-24 | 2004-04-27 | Israel Spiegler | Unified binary model and methodology for knowledge representation and for data and information mining |
US6675159B1 (en) * | 2000-07-27 | 2004-01-06 | Science Applic Int Corp | Concept-based search and retrieval system |
US7146536B2 (en) | 2000-08-04 | 2006-12-05 | Sun Microsystems, Inc. | Fact collection for product knowledge management |
US7100082B2 (en) | 2000-08-04 | 2006-08-29 | Sun Microsystems, Inc. | Check creation and maintenance for product knowledge management |
US7080073B1 (en) | 2000-08-18 | 2006-07-18 | Firstrain, Inc. | Method and apparatus for focused crawling |
US6556991B1 (en) * | 2000-09-01 | 2003-04-29 | E-Centives, Inc. | Item name normalization |
US6823495B1 (en) | 2000-09-14 | 2004-11-23 | Microsoft Corporation | Mapping tool graphical user interface |
US6832218B1 (en) | 2000-09-22 | 2004-12-14 | International Business Machines Corporation | System and method for associating search results |
US7493308B1 (en) * | 2000-10-03 | 2009-02-17 | A9.Com, Inc. | Searching documents using a dimensional database |
US6684205B1 (en) * | 2000-10-18 | 2004-01-27 | International Business Machines Corporation | Clustering hypertext with applications to web searching |
JP2002157276A (ja) | 2000-11-16 | 2002-05-31 | Hitachi Software Eng Co Ltd | 問題解決支援方法及びシステム |
US20020174099A1 (en) * | 2000-11-28 | 2002-11-21 | Anthony Raj | Minimal identification |
US7013308B1 (en) * | 2000-11-28 | 2006-03-14 | Semscript Ltd. | Knowledge storage and retrieval system and method |
US8402068B2 (en) * | 2000-12-07 | 2013-03-19 | Half.Com, Inc. | System and method for collecting, associating, normalizing and presenting product and vendor information on a distributed network |
JP2002230035A (ja) | 2001-01-05 | 2002-08-16 | Internatl Business Mach Corp <Ibm> | 情報整理方法、情報処理装置、情報処理システム、記憶媒体、およびプログラム伝送装置 |
US6879969B2 (en) | 2001-01-21 | 2005-04-12 | Volvo Technological Development Corporation | System and method for real-time recognition of driving patterns |
US6693651B2 (en) * | 2001-02-07 | 2004-02-17 | International Business Machines Corporation | Customer self service iconic interface for resource search results display and selection |
US7143099B2 (en) | 2001-02-08 | 2006-11-28 | Amdocs Software Systems Limited | Historical data warehousing system |
US7216073B2 (en) * | 2001-03-13 | 2007-05-08 | Intelligate, Ltd. | Dynamic natural language understanding |
US6820081B1 (en) | 2001-03-19 | 2004-11-16 | Attenex Corporation | System and method for evaluating a structured message store for message redundancy |
US20020147738A1 (en) | 2001-04-06 | 2002-10-10 | Reader Scot A. | Method and appratus for finding patent-relevant web documents |
JP4159366B2 (ja) | 2001-04-12 | 2008-10-01 | コーニンクレッカ フィリップス エレクトロニクス エヌ ヴィ | ユーザ嗜好を登録するための方法とシステム |
US6556610B1 (en) * | 2001-04-12 | 2003-04-29 | E20 Communications, Inc. | Semiconductor lasers |
US20020169770A1 (en) | 2001-04-27 | 2002-11-14 | Kim Brian Seong-Gon | Apparatus and method that categorize a collection of documents into a hierarchy of categories that are defined by the collection of documents |
US7020662B2 (en) | 2001-05-29 | 2006-03-28 | Sun Microsystems, Inc. | Method and system for determining a directory entry's class of service based on the value of a specifier in the entry |
MXPA03011976A (es) * | 2001-06-22 | 2005-07-01 | Nervana Inc | Sistema y metodo para la recuperacion, manejo, entrega y presentacion de conocimientos. |
US7003552B2 (en) * | 2001-06-25 | 2006-02-21 | Canon Kabushiki Kaisha | Information processing apparatus and control method therefor |
US7263656B2 (en) * | 2001-07-16 | 2007-08-28 | Canon Kabushiki Kaisha | Method and device for scheduling, generating and processing a document comprising blocks of information |
WO2003009251A1 (en) | 2001-07-18 | 2003-01-30 | Hyunjae Tech Co., Ltd | System for automatic recognizing licence number of other vehicles on observation vehicles and method thereof |
JP4571404B2 (ja) * | 2001-07-26 | 2010-10-27 | インターナショナル・ビジネス・マシーンズ・コーポレーション | データ処理方法、データ処理システムおよびプログラム |
CA2354443A1 (en) | 2001-07-31 | 2003-01-31 | Ibm Canada Limited-Ibm Canada Limitee | Method and system for visually constructing xml schemas using an object-oriented model |
US6868411B2 (en) * | 2001-08-13 | 2005-03-15 | Xerox Corporation | Fuzzy text categorizer |
US7398201B2 (en) | 2001-08-14 | 2008-07-08 | Evri Inc. | Method and system for enhanced data searching |
WO2003017023A2 (en) | 2001-08-14 | 2003-02-27 | Quigo Technologies, Inc. | System and method for extracting content for submission to a search engine |
US7386832B2 (en) | 2001-08-31 | 2008-06-10 | Siebel Systems, Inc. | Configurator using structure to provide a user interface |
US7058653B2 (en) | 2001-09-17 | 2006-06-06 | Ricoh Company, Ltd. | Tree system diagram output method, computer program and recording medium |
US7403938B2 (en) * | 2001-09-24 | 2008-07-22 | Iac Search & Media, Inc. | Natural language query processing |
US7020641B2 (en) | 2001-10-22 | 2006-03-28 | Sun Microsystems, Inc. | Method, system, and program for maintaining a database of data objects |
US7197449B2 (en) * | 2001-10-30 | 2007-03-27 | Intel Corporation | Method for extracting name entities and jargon terms using a suffix tree data structure |
CN100461156C (zh) * | 2001-11-09 | 2009-02-11 | 无锡永中科技有限公司 | 集成数据处理系统 |
JP3931214B2 (ja) | 2001-12-17 | 2007-06-13 | 日本アイ・ビー・エム株式会社 | データ解析装置およびプログラム |
US6965900B2 (en) | 2001-12-19 | 2005-11-15 | X-Labs Holdings, Llc | Method and apparatus for electronically extracting application specific multidimensional information from documents selected from a set of documents electronically extracted from a library of electronically searchable documents |
US7096231B2 (en) * | 2001-12-28 | 2006-08-22 | American Management Systems, Inc. | Export engine which builds relational database directly from object model |
US7209906B2 (en) * | 2002-01-14 | 2007-04-24 | International Business Machines Corporation | System and method for implementing a metrics engine for tracking relationships over time |
US7398461B1 (en) | 2002-01-24 | 2008-07-08 | Overture Services, Inc. | Method for ranking web page search results |
EP1485825A4 (en) | 2002-02-04 | 2008-03-19 | Cataphora Inc | DETAILED EXPLORATION TECHNIQUE OF SOCIOLOGICAL DATA AND CORRESPONDING APPARATUS |
US20030149567A1 (en) | 2002-02-04 | 2003-08-07 | Tony Schmitz | Method and system for using natural language in computer resource utilization analysis via a communications network |
US7421660B2 (en) | 2003-02-04 | 2008-09-02 | Cataphora, Inc. | Method and apparatus to visually present discussions for data mining purposes |
US20030154071A1 (en) | 2002-02-11 | 2003-08-14 | Shreve Gregory M. | Process for the document management and computer-assisted translation of documents utilizing document corpora constructed by intelligent agents |
US7165024B2 (en) * | 2002-02-22 | 2007-01-16 | Nec Laboratories America, Inc. | Inferring hierarchical descriptions of a set of documents |
JP4098539B2 (ja) | 2002-03-15 | 2008-06-11 | 富士通株式会社 | プロファイル情報の推薦方法、プログラム及び装置 |
US7043521B2 (en) * | 2002-03-21 | 2006-05-09 | Rockwell Electronic Commerce Technologies, Llc | Search agent for searching the internet |
JP3896014B2 (ja) | 2002-03-22 | 2007-03-22 | 株式会社東芝 | 情報収集システム、情報収集方法及びコンピュータに情報収集を実行させるプログラム |
CA2479228C (en) | 2002-03-27 | 2011-08-09 | British Telecommunications Public Limited Company | Network security system |
US6857053B2 (en) | 2002-04-10 | 2005-02-15 | International Business Machines Corporation | Method, system, and program for backing up objects by creating groups of objects |
TWI256562B (en) | 2002-05-03 | 2006-06-11 | Ind Tech Res Inst | Method for named-entity recognition and verification |
US6963880B1 (en) | 2002-05-10 | 2005-11-08 | Oracle International Corporation | Schema evolution of complex objects |
US20040015481A1 (en) * | 2002-05-23 | 2004-01-22 | Kenneth Zinda | Patent data mining |
US7003522B1 (en) | 2002-06-24 | 2006-02-21 | Microsoft Corporation | System and method for incorporating smart tags in online content |
US20040003067A1 (en) | 2002-06-27 | 2004-01-01 | Daniel Ferrin | System and method for enabling a user interface with GUI meta data |
US20040006748A1 (en) | 2002-07-03 | 2004-01-08 | Amit Srivastava | Systems and methods for providing online event tracking |
GB0215464D0 (en) | 2002-07-04 | 2002-08-14 | Hewlett Packard Co | Combining data descriptions |
WO2004019264A1 (en) | 2002-08-22 | 2004-03-04 | Agency For Science, Technology And Research | Prediction by collective likelihood from emerging patterns |
US20040059726A1 (en) * | 2002-09-09 | 2004-03-25 | Jeff Hunter | Context-sensitive wordless search |
US20040064447A1 (en) * | 2002-09-27 | 2004-04-01 | Simske Steven J. | System and method for management of synonymic searching |
US6886010B2 (en) * | 2002-09-30 | 2005-04-26 | The United States Of America As Represented By The Secretary Of The Navy | Method for data and text mining and literature-based discovery |
US7096217B2 (en) | 2002-10-31 | 2006-08-22 | International Business Machines Corporation | Global query correlation attributes |
US20050108256A1 (en) | 2002-12-06 | 2005-05-19 | Attensity Corporation | Visualization of integrated structured and unstructured data |
US7277879B2 (en) | 2002-12-17 | 2007-10-02 | Electronic Data Systems Corporation | Concept navigation in data storage systems |
US7181450B2 (en) | 2002-12-18 | 2007-02-20 | International Business Machines Corporation | Method, system, and program for use of metadata to create multidimensional cubes in a relational database |
US20040122846A1 (en) * | 2002-12-19 | 2004-06-24 | Ibm Corporation | Fact verification system |
US7107528B2 (en) | 2002-12-20 | 2006-09-12 | International Business Machines Corporation | Automatic completion of dates |
US7472182B1 (en) | 2002-12-31 | 2008-12-30 | Emc Corporation | Data collection policy for storage devices |
GB0304639D0 (en) | 2003-02-28 | 2003-04-02 | Kiq Ltd | Classification using re-sampling of probability estimates |
US7020666B2 (en) | 2003-03-07 | 2006-03-28 | Microsoft Corporation | System and method for unknown type serialization |
US7051023B2 (en) | 2003-04-04 | 2006-05-23 | Yahoo! Inc. | Systems and methods for generating concept units from search queries |
EP1629359A4 (en) | 2003-04-07 | 2008-01-09 | Sevenecho Llc | METHOD, SYSTEM AND SOFTWARE FOR CUSTOMIZING PERSONALIZED NARRATIVE PRESENTATIONS |
US20040243552A1 (en) * | 2003-05-30 | 2004-12-02 | Dictaphone Corporation | Method, system, and apparatus for viewing data |
US8095544B2 (en) | 2003-05-30 | 2012-01-10 | Dictaphone Corporation | Method, system, and apparatus for validation |
US7747571B2 (en) | 2003-04-15 | 2010-06-29 | At&T Intellectual Property, I,L.P. | Methods, systems, and computer program products for implementing logical and physical data models |
EP1477892B1 (en) * | 2003-05-16 | 2015-12-23 | Sap Se | System, method, computer program product and article of manufacture for inputting data in a computer system |
JP2004362223A (ja) | 2003-06-04 | 2004-12-24 | Hitachi Ltd | 情報マイニングシステム |
US7836391B2 (en) | 2003-06-10 | 2010-11-16 | Google Inc. | Document search engine including highlighting of confident results |
US9026901B2 (en) | 2003-06-20 | 2015-05-05 | International Business Machines Corporation | Viewing annotations across multiple applications |
US7162473B2 (en) | 2003-06-26 | 2007-01-09 | Microsoft Corporation | Method and system for usage analyzer that determines user accessed sources, indexes data subsets, and associated metadata, processing implicit queries based on potential interest to users |
US7739588B2 (en) | 2003-06-27 | 2010-06-15 | Microsoft Corporation | Leveraging markup language data for semantically labeling text strings and data and for providing actions based on semantically labeled text strings and data |
WO2005008358A2 (en) | 2003-07-22 | 2005-01-27 | Kinor Technologies Inc. | Information access using ontologies |
WO2005010727A2 (en) * | 2003-07-23 | 2005-02-03 | Praedea Solutions, Inc. | Extracting data from semi-structured text documents |
CA2536265C (en) * | 2003-08-21 | 2012-11-13 | Idilia Inc. | System and method for processing a query |
US20050055365A1 (en) * | 2003-09-09 | 2005-03-10 | I.V. Ramakrishnan | Scalable data extraction techniques for transforming electronic documents into queriable archives |
US7644076B1 (en) * | 2003-09-12 | 2010-01-05 | Teradata Us, Inc. | Clustering strings using N-grams |
US8589373B2 (en) | 2003-09-14 | 2013-11-19 | Yaron Mayer | System and method for improved searching on the internet or similar networks and especially improved MetaNews and/or improved automatically generated newspapers |
US8086690B1 (en) | 2003-09-22 | 2011-12-27 | Google Inc. | Determining geographical relevance of web documents |
US7496560B2 (en) * | 2003-09-23 | 2009-02-24 | Amazon Technologies, Inc. | Personalized searchable library with highlighting capabilities |
US7158980B2 (en) * | 2003-10-02 | 2007-01-02 | Acer Incorporated | Method and apparatus for computerized extracting of scheduling information from a natural language e-mail |
AU2003290397A1 (en) | 2003-10-15 | 2005-04-27 | Dharamdas Gautam Goradia | Interactive wisdom system |
JP4729844B2 (ja) * | 2003-10-16 | 2011-07-20 | 富士ゼロックス株式会社 | サーバ装置、情報の提供方法、及びプログラム |
KR100533810B1 (ko) * | 2003-10-16 | 2005-12-07 | 한국전자통신연구원 | 백과사전 질의응답 시스템의 지식베이스 반자동 구축 방법 |
US20050144241A1 (en) | 2003-10-17 | 2005-06-30 | Stata Raymond P. | Systems and methods for a search-based email client |
GB0325626D0 (en) | 2003-11-03 | 2003-12-10 | Infoshare Ltd | Data aggregation |
US20050108630A1 (en) | 2003-11-19 | 2005-05-19 | Wasson Mark D. | Extraction of facts from text |
US7512553B2 (en) | 2003-12-05 | 2009-03-31 | International Business Machines Corporation | System for automated part-number mapping |
US20050138007A1 (en) | 2003-12-22 | 2005-06-23 | International Business Machines Corporation | Document enhancement method |
US8150824B2 (en) | 2003-12-31 | 2012-04-03 | Google Inc. | Systems and methods for direct navigation to specific portion of target document |
US20050149851A1 (en) | 2003-12-31 | 2005-07-07 | Google Inc. | Generating hyperlinks and anchor text in HTML and non-HTML documents |
US7424467B2 (en) | 2004-01-26 | 2008-09-09 | International Business Machines Corporation | Architecture for an indexer with fixed width sort and variable width sort |
US7499913B2 (en) | 2004-01-26 | 2009-03-03 | International Business Machines Corporation | Method for handling anchor text |
WO2005083597A1 (en) | 2004-02-20 | 2005-09-09 | Dow Jones Reuters Business Interactive, Llc | Intelligent search and retrieval system and method |
US7756823B2 (en) | 2004-03-26 | 2010-07-13 | Lockheed Martin Corporation | Dynamic reference repository |
US7725498B2 (en) | 2004-04-22 | 2010-05-25 | International Business Machines Corporation | Techniques for identifying mergeable data |
US7260573B1 (en) | 2004-05-17 | 2007-08-21 | Google Inc. | Personalizing anchor text scores in a search engine |
US20050278314A1 (en) | 2004-06-09 | 2005-12-15 | Paul Buchheit | Variable length snippet generation |
US7716225B1 (en) | 2004-06-17 | 2010-05-11 | Google Inc. | Ranking documents based on user behavior and/or feature data |
US7454430B1 (en) | 2004-06-18 | 2008-11-18 | Glenbrook Networks | System and method for facts extraction and domain knowledge repository creation from unstructured and semi-structured documents |
US8051207B2 (en) * | 2004-06-25 | 2011-11-01 | Citrix Systems, Inc. | Inferring server state in s stateless communication protocol |
US20060036504A1 (en) | 2004-08-11 | 2006-02-16 | Allocca William W | Dynamically classifying items for international delivery |
US20060041375A1 (en) | 2004-08-19 | 2006-02-23 | Geographic Data Technology, Inc. | Automated georeferencing of digitized map images |
US7809695B2 (en) * | 2004-08-23 | 2010-10-05 | Thomson Reuters Global Resources | Information retrieval systems with duplicate document detection and presentation functions |
US20060047691A1 (en) * | 2004-08-31 | 2006-03-02 | Microsoft Corporation | Creating a document index from a flex- and Yacc-generated named entity recognizer |
US20060053171A1 (en) * | 2004-09-03 | 2006-03-09 | Biowisdom Limited | System and method for curating one or more multi-relational ontologies |
US20060053175A1 (en) * | 2004-09-03 | 2006-03-09 | Biowisdom Limited | System and method for creating, editing, and utilizing one or more rules for multi-relational ontology creation and maintenance |
US20060074910A1 (en) | 2004-09-17 | 2006-04-06 | Become, Inc. | Systems and methods of retrieving topic specific information |
JP4587756B2 (ja) | 2004-09-21 | 2010-11-24 | ルネサスエレクトロニクス株式会社 | 半導体集積回路装置 |
US20060064411A1 (en) * | 2004-09-22 | 2006-03-23 | William Gross | Search engine using user intent |
US7809763B2 (en) * | 2004-10-15 | 2010-10-05 | Oracle International Corporation | Method(s) for updating database object metadata |
US7822768B2 (en) | 2004-11-23 | 2010-10-26 | International Business Machines Corporation | System and method for automating data normalization using text analytics |
US9137115B2 (en) | 2004-12-06 | 2015-09-15 | Bmc Software, Inc. | System and method for resource reconciliation in an enterprise management system |
US20060167991A1 (en) | 2004-12-16 | 2006-07-27 | Heikes Brian D | Buddy list filtering |
US20060143227A1 (en) | 2004-12-27 | 2006-06-29 | Helm Martin W | System and method for persisting software objects |
US8719779B2 (en) | 2004-12-28 | 2014-05-06 | Sap Ag | Data object association based on graph theory techniques |
US7672971B2 (en) * | 2006-02-17 | 2010-03-02 | Google Inc. | Modular architecture for entity normalization |
US7769579B2 (en) * | 2005-05-31 | 2010-08-03 | Google Inc. | Learning facts from semi-structured text |
US20060149800A1 (en) | 2004-12-30 | 2006-07-06 | Daniel Egnor | Authoritative document identification |
US7685136B2 (en) | 2005-01-12 | 2010-03-23 | International Business Machines Corporation | Method, system and program product for managing document summary information |
US9208229B2 (en) | 2005-03-31 | 2015-12-08 | Google Inc. | Anchor text summarization for corroboration |
US7953720B1 (en) | 2005-03-31 | 2011-05-31 | Google Inc. | Selecting the best answer to a fact query from among a set of potential answers |
US7587387B2 (en) | 2005-03-31 | 2009-09-08 | Google Inc. | User interface for facts query engine with snippets from information sources that include query terms and answer terms |
US20060238919A1 (en) | 2005-04-20 | 2006-10-26 | The Boeing Company | Adaptive data cleaning |
US20060248456A1 (en) | 2005-05-02 | 2006-11-02 | Ibm Corporation | Assigning a publication date for at least one electronic document |
US20060259462A1 (en) * | 2005-05-12 | 2006-11-16 | Sybase, Inc. | System and Methodology for Real-time Content Aggregation and Syndication |
US7590647B2 (en) | 2005-05-27 | 2009-09-15 | Rage Frameworks, Inc | Method for extracting, interpreting and standardizing tabular data from unstructured documents |
US20060277169A1 (en) | 2005-06-02 | 2006-12-07 | Lunt Tracy T | Using the quantity of electronically readable text to generate a derivative attribute for an electronic file |
US7630977B2 (en) | 2005-06-29 | 2009-12-08 | Xerox Corporation | Categorization including dependencies between different category systems |
US20070005593A1 (en) | 2005-06-30 | 2007-01-04 | Microsoft Corporation | Attribute-based data retrieval and association |
CA2545232A1 (en) * | 2005-07-29 | 2007-01-29 | Cognos Incorporated | Method and system for creating a taxonomy from business-oriented metadata content |
US8666928B2 (en) * | 2005-08-01 | 2014-03-04 | Evi Technologies Limited | Knowledge repository |
US7797282B1 (en) | 2005-09-29 | 2010-09-14 | Hewlett-Packard Development Company, L.P. | System and method for modifying a training set |
US7493317B2 (en) | 2005-10-20 | 2009-02-17 | Omniture, Inc. | Result-based triggering for presentation of online content |
US7730013B2 (en) | 2005-10-25 | 2010-06-01 | International Business Machines Corporation | System and method for searching dates efficiently in a collection of web documents |
KR100755678B1 (ko) | 2005-10-28 | 2007-09-05 | 삼성전자주식회사 | 개체명 검출 장치 및 방법 |
US7532979B2 (en) | 2005-11-10 | 2009-05-12 | Tele Atlas North America, Inc. | Method and system for creating universal location referencing objects |
US7574449B2 (en) | 2005-12-02 | 2009-08-11 | Microsoft Corporation | Content matching |
US8954426B2 (en) | 2006-02-17 | 2015-02-10 | Google Inc. | Query language |
US7555471B2 (en) | 2006-01-27 | 2009-06-30 | Google Inc. | Data object visualization |
US7454398B2 (en) | 2006-02-17 | 2008-11-18 | Google Inc. | Support for object search |
US7774328B2 (en) * | 2006-02-17 | 2010-08-10 | Google Inc. | Browseable fact repository |
US8260785B2 (en) | 2006-02-17 | 2012-09-04 | Google Inc. | Automatic object reference identification and linking in a browseable fact repository |
US7991797B2 (en) | 2006-02-17 | 2011-08-02 | Google Inc. | ID persistence through normalization |
US8700568B2 (en) | 2006-02-17 | 2014-04-15 | Google Inc. | Entity normalization via name normalization |
US8712192B2 (en) | 2006-04-20 | 2014-04-29 | Microsoft Corporation | Geo-coding images |
US9286404B2 (en) | 2006-06-28 | 2016-03-15 | Nokia Technologies Oy | Methods of systems using geographic meta-metadata in information retrieval and document displays |
US7685201B2 (en) * | 2006-09-08 | 2010-03-23 | Microsoft Corporation | Person disambiguation using name entity extraction-based clustering |
US8458207B2 (en) * | 2006-09-15 | 2013-06-04 | Microsoft Corporation | Using anchor text to provide context |
US8122026B1 (en) | 2006-10-20 | 2012-02-21 | Google Inc. | Finding and disambiguating references to entities on web pages |
US7698336B2 (en) | 2006-10-26 | 2010-04-13 | Microsoft Corporation | Associating geographic-related information with objects |
US7917154B2 (en) * | 2006-11-01 | 2011-03-29 | Yahoo! Inc. | Determining mobile content for a social network based on location and time |
US8108501B2 (en) * | 2006-11-01 | 2012-01-31 | Yahoo! Inc. | Searching and route mapping based on a social network, location, and time |
US8347202B1 (en) | 2007-03-14 | 2013-01-01 | Google Inc. | Determining geographic locations for place names in a fact repository |
US8316007B2 (en) * | 2007-06-28 | 2012-11-20 | Oracle International Corporation | Automatically finding acronyms and synonyms in a corpus |
US8812435B1 (en) | 2007-11-16 | 2014-08-19 | Google Inc. | Learning objects and facts from documents |
US8024281B2 (en) | 2008-02-29 | 2011-09-20 | Red Hat, Inc. | Alpha node hashing in a rule engine |
-
2005
- 2005-05-31 US US11/142,853 patent/US7769579B2/en not_active Expired - Fee Related
-
2006
- 2006-03-31 US US11/394,414 patent/US8825471B2/en active Active
- 2006-04-07 US US11/399,857 patent/US20070143317A1/en not_active Abandoned
- 2006-05-18 CA CA2610208A patent/CA2610208C/en active Active
- 2006-05-18 WO PCT/US2006/019807 patent/WO2006132793A2/en active Application Filing
- 2006-05-18 CN CN2006800280576A patent/CN101253498B/zh active Active
- 2006-05-18 EP EP06784449A patent/EP1891557A2/en not_active Withdrawn
-
2014
- 2014-08-14 US US14/460,117 patent/US9558186B2/en active Active
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102200983A (zh) * | 2010-03-25 | 2011-09-28 | 日电(中国)有限公司 | 属性提取装置和方法 |
CN102662986A (zh) * | 2012-01-13 | 2012-09-12 | 中国科学院计算技术研究所 | 微博消息检索系统与方法 |
CN105488105A (zh) * | 2015-11-19 | 2016-04-13 | 百度在线网络技术(北京)有限公司 | 信息提取模板的建立方法、知识数据的处理方法和装置 |
CN105488105B (zh) * | 2015-11-19 | 2019-11-05 | 百度在线网络技术(北京)有限公司 | 信息提取模板的建立方法、知识数据的处理方法和装置 |
Also Published As
Publication number | Publication date |
---|---|
WO2006132793A3 (en) | 2007-02-08 |
US20060293879A1 (en) | 2006-12-28 |
US20140372473A1 (en) | 2014-12-18 |
US7769579B2 (en) | 2010-08-03 |
CA2610208C (en) | 2012-07-10 |
CA2610208A1 (en) | 2006-12-14 |
US20070150800A1 (en) | 2007-06-28 |
CN101253498B (zh) | 2010-12-08 |
US9558186B2 (en) | 2017-01-31 |
US8825471B2 (en) | 2014-09-02 |
US20070143317A1 (en) | 2007-06-21 |
EP1891557A2 (en) | 2008-02-27 |
WO2006132793A2 (en) | 2006-12-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN101253498B (zh) | 从半结构化的文本学习事实 | |
US7831545B1 (en) | Identifying the unifying subject of a set of facts | |
CN102402604B (zh) | 搜索引擎的有效前向排序 | |
CN102687138B (zh) | 搜索建议聚类和呈现 | |
CN101185074B (zh) | 用于事实查询引擎的带有来自信息源的包含查询词语和回答词语的片段的用户界面 | |
US8046681B2 (en) | Techniques for inducing high quality structural templates for electronic documents | |
US8707167B2 (en) | High precision data extraction | |
US9323731B1 (en) | Data extraction using templates | |
US20090125529A1 (en) | Extracting information based on document structure and characteristics of attributes | |
CN100514323C (zh) | 用于自动提取副标题信息的系统和方法 | |
CN102662969B (zh) | 一种基于网页结构语义的互联网信息对象定位方法 | |
CN106294535B (zh) | 网站的识别方法和装置 | |
CN103425687A (zh) | 一种基于关键词的检索方法和系统 | |
JP2010501096A (ja) | ラッパー生成およびテンプレート検出の協同最適化 | |
CN102122295A (zh) | 包括确信结果的突出显示的文档搜索引擎 | |
CN101118555A (zh) | 关键词的联想信息生成系统和生成方法 | |
CN101128820A (zh) | 基于可视间隙的文档分割 | |
US20080281827A1 (en) | Using structured database for webpage information extraction | |
US7421416B2 (en) | Method of managing web sites registered in search engine and a system thereof | |
US20030177115A1 (en) | System and method for automatic preparation and searching of scanned documents | |
US11222013B2 (en) | Custom named entities and tags for natural language search query processing | |
CN112395418B (zh) | 网页中的目标对象提取方法、装置、电子设备 | |
Biagioli et al. | The NIR project: Standards and tools for legislative drafting and legal document web publication | |
CN109948015B (zh) | 一种元搜索列表结果抽取方法及系统 | |
CN112989142A (zh) | 一种可配置化标签的处理系统、方法和装置 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
CP01 | Change in the name or title of a patent holder | ||
CP01 | Change in the name or title of a patent holder |
Address after: American California Patentee after: Google limited liability company Address before: American California Patentee before: Google Inc. |