WO2010035412A1 - 情報分析装置、情報分析方法、及びプログラム - Google Patents
情報分析装置、情報分析方法、及びプログラム Download PDFInfo
- Publication number
- WO2010035412A1 WO2010035412A1 PCT/JP2009/004399 JP2009004399W WO2010035412A1 WO 2010035412 A1 WO2010035412 A1 WO 2010035412A1 JP 2009004399 W JP2009004399 W JP 2009004399W WO 2010035412 A1 WO2010035412 A1 WO 2010035412A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- link
- language expression
- time
- information
- electronic documents
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/31—Indexing; Data structures therefor; Storage structures
- G06F16/316—Indexing structures
- G06F16/319—Inverted lists
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/955—Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
- G06F16/9558—Details of hyperlinks; Management of linked annotations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
Definitions
- the present invention relates to an information analysis device, an information analysis method, and a program, and more particularly, to an information analysis device, an information analysis method, and a program that analyze a relationship between language expressions based on time information.
- Patent Document 1 uses a technique for detecting and presenting information having a correlation with information that the user wants to know. Specifically, in Patent Document 1, first, a keyword that co-occurs frequently with a keyword of interest at a certain point in time and has a similar appearance time is detected. Then, a co-occurrence graph is generated in which the keyword of interest and the detected keyword are displayed. By analyzing this co-occurrence graph, the user can know the reason why the keyword attracting attention is attracting attention.
- language expression may be the character string itself that appears in the text, or the result of analyzing the text using existing natural language processing techniques such as morphological analysis, syntax analysis, dependency analysis, synonym processing, etc. , May be obtained.
- cigarette is harmful to health
- tobacco is harmful is also one set. It is a linguistic expression that represents the meaning.
- An object of the present invention is to solve the above-described problems and to appropriately evaluate the correlation between a plurality of language expressions to be analyzed without being affected by an accidental cause. And providing a program.
- an information analysis apparatus is an information analysis apparatus that analyzes a plurality of language expressions, A link information generation unit and a correlation value calculation unit;
- the link information generation unit extracts time information included in each of the plurality of electronic documents and a relationship between the electronic documents in the plurality of electronic documents from a plurality of electronic documents including at least one of the plurality of language expressions.
- the correlation value calculation unit specifies the number of appearances and the appearance time of each link between the one language expression and the other language expression from the link information, and the appearance of the specified link Using the number and the appearance time of each link, the correlation value between the one language expression and the other language expression is calculated according to the degree of the continuous appearance of the link.
- an information analysis method for analyzing a plurality of language expressions, (A) extracting time information possessed by each of the plurality of electronic documents and a relationship between the electronic documents in the plurality of electronic documents from a plurality of electronic documents including at least one of the plurality of language expressions; (B) a link between one linguistic expression and the other linguistic expression in the plurality of linguistic expressions based on the time information extracted in the step of (a) and the relationship between the electronic documents; Detecting the appearance time of the link and generating link information identifying the detected link and the appearance time of the link; (C) From the link information generated in the step (b), specify the number of appearances and the appearance time of each link between the one language expression and the other language expression The correlation value between the one language expression and the other one language expression according to the degree of the continuous appearance of the link using the number of appearances of the link and the appearance time of each link. And a step of calculating.
- a program for causing a computer to perform information analysis for analyzing a plurality of language expressions In the computer, (A) extracting, from a plurality of electronic documents including at least one of the plurality of language expressions, time information possessed by each of the plurality of electronic documents and a relationship between the electronic documents in the plurality of electronic documents; (B) a link between one linguistic expression and the other linguistic expression in the plurality of linguistic expressions based on the time information extracted in the step of (a) and the relationship between the electronic documents; Detecting the appearance time of the link and generating link information identifying the detected link and the appearance time of the link; (C) From the link information generated in the step (b), the number of occurrences and the appearance time of each link between the one language expression and the other language expression are specified and specified.
- the correlation value between the one language expression and the other one language expression is used according to the degree of the continuous appearance of the link using the number
- the correlation between a plurality of language expressions to be analyzed is appropriately evaluated without being affected by an accidental cause.
- FIG. 1 is a block diagram showing a schematic configuration of the information analysis apparatus according to Embodiment 1 of the present invention.
- FIG. 2 is a diagram illustrating an example of information stored in the storage device illustrated in FIG. 1.
- FIG. 3 is a diagram showing an example of link information generated in the first embodiment of the present invention.
- FIG. 4 is a diagram showing another example of link information generated in the first embodiment of the present invention.
- FIG. 5 is a flowchart showing the flow of processing in the information analysis method according to Embodiment 1 of the present invention.
- FIG. 6 is a diagram showing a computer device capable of realizing the information analysis device 1 shown in FIG.
- FIG. 7 is a block diagram showing a schematic configuration of the information analysis apparatus according to Embodiment 2 of the present invention.
- FIG. 8 is a flowchart showing the flow of processing in the information analysis method according to Embodiment 2 of the present invention.
- FIG. 1 is a block diagram showing a schematic configuration of the information analysis apparatus according to Embodiment 1 of the present invention.
- FIG. 2 is a diagram illustrating an example of information stored in the storage device illustrated in FIG. 1.
- FIG. 3 is a diagram showing an example of link information generated in the first embodiment of the present invention.
- FIG. 4 is a diagram showing another example of link information generated in the first embodiment of the present invention.
- the information analysis apparatus 1 is an apparatus for analyzing a plurality of language expressions, and is used for analyzing the correlation between one language expression and another language expression. As illustrated in FIG. 1, the information analysis apparatus 1 includes a link information generation unit 3 that generates link information and a correlation value calculation unit 4 that calculates a correlation value between language expressions.
- the link information generation unit 3 extracts time information included in each of the plurality of electronic documents and a relationship between the electronic documents in the plurality of electronic documents from a plurality of electronic documents including at least one of the plurality of language expressions. .
- the link information generation unit 3 is configured to link between one language expression in a plurality of language expressions and another one of the language expressions based on the extracted time information and the relationship between the electronic documents. Then, link information specifying the detected link and the appearance time of the link is generated.
- Correlation value calculation unit 4 specifies the number of appearances of links between one language expression and another language expression and the appearance time of each link from the link information. Then, the correlation value calculation unit 4 uses the number of occurrences of the identified link and the appearance time of each link to determine the correlation value between one language expression and another language expression. Calculate according to the degree of appearance.
- the “degree that the link continuously appears” is a degree that represents the strength of whether or not the link exists continuously over a long period of time.
- the link between the language expressions is detected in consideration of not only the time information related to the language expression but also the relationship between documents including each language expression. Further, a correlation value is calculated using such a link, and the correlation is determined. For this reason, according to the information analysis device 1, a situation in which the correlation value increases due to an accidental cause and an error occurs in the determination of the correlation is avoided.
- the information analysis apparatus 1 further includes an input unit 2 that receives an input of a language expression to be analyzed.
- a storage device 10 an input device 12, and an output device 13 are connected to the information analysis device 1.
- the input device 12 is connected to the input unit 2 of the information analysis device 1 from the outside, and inputs information such as language expression to be analyzed.
- Specific examples of the input device 12 include a keyboard and a mouse.
- the output device 13 is a device for outputting the analysis result.
- Specific examples of the output device 13 include a display device such as a liquid crystal display, a printer, and the like.
- the input device 12 and the output device 13 may be attached to another computer device connected to the information analysis device 1 via a network.
- the storage device 10 has a document storage unit 11 in a storage area, and is used for link information generation by the link information generation unit 3.
- the storage device 10 can be realized by storing a data file in a storage device such as a hard disk (magnetic disk storage device) or by mounting a recording medium storing the data file on a reading device.
- the storage device 10 may be directly connected to the information analysis device 1 or may be provided in another computer device connected to the information analysis device 1 via a network.
- the document storage unit 11 stores a large number of electronic documents.
- the storage area of the storage device 10 functioning as the document storage unit 11 is schematically expressed in a table format, but the storage area can also be expressed in another format.
- each horizontal row represents one electronic document stored in the document storage unit 11.
- the document storage unit 11 stores an identifier (document ID), time information, and reference document ID of each electronic document in addition to the document contents. These pieces of information are associated with each electronic document and stored as one set data.
- date information for specifying the date is stored as time information.
- the document with the document ID “10001” is stored in a state associated with the date represented by “2004/4/15”.
- the date is used as the time information, but the first embodiment is not limited to this example.
- time information that specifies the hour, minute, and second in addition to the year, month, and day may be used.
- a time obtained by integrating unit times such as seconds on the basis of a certain time point may be used as time information.
- the reference document ID is set based on a reference relationship between stored electronic documents. Specifically, when the electronic document is extracted from a Web page on the Internet, the reference document ID is acquired from the hyperlink information. That is, the document ID of the electronic document extracted from the linked web page described in the original web page is used. Further, when the electronic document is described in the HTML format, the document ID and the reference ID may be represented by a URL.
- the reference document ID may be set based on a logical relationship between stored electronic documents.
- a logical relationship between electronic documents a similar relationship, a conflict relationship, or the like can be adopted.
- the reference document ID may be set from a similar relationship between stored electronic documents, or may be set from a conflict relationship between stored electronic documents. In the former case, the document ID of another electronic document whose contents are semantically similar is stored as the reference document ID.
- whether or not the contents are semantically similar is determined by extracting a document vector in units of morphemes from each electronic document, and using the document vector to calculate the cosine similarity between the electronic documents. This can be done by calculating. For example, an electronic document exceeding a preset threshold value may be a similar electronic document, or several electronic documents with higher similarity may be a similar electronic document.
- a negative expression is added to the characteristic linguistic expression in the electronic document, and a linguistic expression that opposes this linguistic expression is generated. Then, an electronic document including the conflicting language expression is extracted, and the extracted electronic document ID is stored as a reference document ID.
- the method for extracting an electronic document having a logical relationship is not limited to the above example, and can be implemented in various modes without departing from the gist of the present invention.
- the link information generation unit 3 first accesses the document storage unit 11 of the storage device 10 when the language expression to be analyzed is input from the input unit 2, and the input language expression is displayed. Search for electronic documents that contain it.
- the link information generation unit 2 includes not only a linguistic expression that matches as a character string but also an electronic document including a synonymous linguistic expression (synonymous expression) having the same semantic content. You can also search. Examples of synonymous expressions include language expressions that match the input language expression in terms of syntactic structure, and language expressions that are obtained by replacing a part of the input language expression with synonyms.
- the link information generation unit 3 extracts time information possessed by each of the plurality of electronic documents and relationships between the electronic documents in the plurality of electronic documents from the plurality of electronic documents specified by the search. To do.
- the link information generation unit 3 extracts the reference document ID shown in FIG. 2 as the relationship between the electronic documents. Specifically, the time information of each electronic document with document IDs “10001”, “10102”, “11003”, and “12004” specified by the search and the reference document ID are extracted.
- the link information generation unit 3 detects a link between the language expression X and the language expression Y based on the extracted time information and the reference document ID. Further, in the first embodiment, in order to increase the accuracy of the correlation value described later, the link information generation unit 3 specifies the appearance time of the link in addition to the detected link when detecting the link.
- the document ID “10001” specified by the search is described as the reference document ID of the document ID “10102” specified by the search.
- the link information generation unit 3 detects one link between the language expression X and the language expression Y from the document IDs “10001” and “10102”.
- the time information of the electronic document with the document ID “10001” and the time information of the electronic document with the document ID “10102” are associated with each other and detected as the appearance time of the link.
- the link information generation unit 3 detects one link between the language expression X and the language expression Y from the document IDs “11003” and “12004”.
- the time information of the electronic document with the document ID “11003” and the time information of the electronic document with the document ID “12004” are also associated with each other, and the time included therein is detected as the appearance time of the link.
- the link information generation unit 3 After that, the link information generation unit 3 generates link information that identifies the detected link and the appearance time of each link. Specifically, in the first embodiment, as shown in FIG. 3, the link information generation unit 3 generates link information between the language expression X and the language expression Y. In FIG. 3, the link information between the language expression X and the language expression Y is schematically expressed in a tabular format, but the link information can also be expressed in another format.
- each row in the horizontal direction represents one link, and the number of occurrences of the link is N (n and N are arbitrary natural numbers satisfying n ⁇ N).
- the appearance time indicating when each link appears in the language expression is associated with each link for each language expression. This appearance time corresponds to the time included in the time information of the electronic document including each language expression.
- a link 1 represents a link between the electronic document with the document ID “10001” and the electronic document with the document ID “10102”.
- the appearance time of the link in the language expression X of the link matches the time included in the time information of the electronic document with the document ID “10001”, and the appearance time of the link in the language expression Y is the electronic document with the document ID “10102”. It matches the time included in the time information.
- time information is associated. For example, when a link has directionality, information indicating which language expression is a link from which language expression (link source) And information indicating the link destination) may be associated with each other.
- each link is associated with the appearance time for each language expression, but may be associated with only the appearance time of one of the language expressions.
- the link information generation unit 3 can obtain a representative time such as an intermediate time between the appearance time of one language expression and the appearance time of the other language expression. In this case, the obtained representative time can be used as the link appearance time and associated with the link. As described above, when one appearance time (representative time) is associated with one link, the processing speed in the link information generation unit 3 can be improved.
- the reference document ID can be set from a similar relationship between stored electronic documents or can be set from a conflict relationship between stored electronic documents.
- the link information generation unit 3 extracts a semantic similarity relationship between electronic documents, and based on this, extracts a link between linguistic expressions.
- the link information generation unit 3 extracts a semantic conflict between electronic documents, and based on this, extracts a link between linguistic expressions.
- link information between linguistic expressions can be generated when there is no direct reference relationship between electronic documents.
- the correlation value calculation unit 4 specifies the number N of appearances of links from the link information shown in FIG. 3 and also specifies the appearance time of each link. Then, the correlation value calculation unit 4 calculates the correlation value R between the linguistic expressions according to the degree of the continuous appearance of the link, using the number of appearances N of the links and the appearance time of each link. Specifically, the correlation value calculation unit 4 can calculate the correlation value R between language expressions using the following equation (1).
- f (N) is a function that increases in accordance with the value of the appearance number N used for calculation.
- ⁇ is a weighting coefficient. For example, a value normalized by N, which is the maximum value of the number of appearances of links used for calculation, can be used as ⁇ . ⁇ thus obtained corrects the influence of f (N) on the correlation value.
- dt max represents the difference between the appearance time of the latest link and the appearance time of the first link.
- G (dt max ) is a function that increases according to the value of dt max and takes into account the degree of continuous appearance of the link.
- the correlation value calculation unit 4 can also calculate the correlation value R between language expressions using the following equation (2).
- F (N) in the above equation (2) is a function similar to f (N) used in the above equation (1).
- ⁇ is a constant correction value that is not 0 (zero).
- V is a value representing the degree of continuous appearance of a link using a time interval between adjacent links and a variance relating to the appearance density of links. For example, it can be obtained by the following formula (3) or the following formula (4).
- the following equation (3) is a variance regarding the time interval of the appearance time between links.
- dT represents an average value obtained by arranging the appearance times (representative values) of the links obtained from the link information in time series and averaging the time intervals between adjacent links.
- dt n represents the difference between the appearance time of the nth link and the appearance time of the (n + 1) th link.
- V may be replaced with the standard deviation by calculating the square root of V. Further, “V” can also be obtained by the following equation (4) by using the dispersion relating to the appearance density of the links at a predetermined time interval.
- M in the above formula (4) is a section obtained when a time interval between the appearance time of the first link between the language expressions to be calculated and the appearance time of the latest link is divided at a predetermined time interval. The number attached to is shown. M represents the number of sections. dq m represents the number of links in the mth section. That is, dq m indicates the appearance density of links.
- dQ represents an average value obtained by averaging dq m . That is, dQ represents an average value of the appearance density of links.
- N is significantly larger than M
- the processing speed can be improved.
- V may be replaced with the standard deviation by calculating the square root of V.
- the above formula (2) has an effect of weakening the influence due to the uneven distribution of links. For example, when considering a link between a language expression of interest and a language expression that occurs suddenly, the time interval between adjacent links varies significantly, and the difference between the maximum and minimum time intervals is large. Value. If the correlation value R is calculated by applying the above equation (2) between such language expressions, the correlation value R becomes a low value. For this reason, the above formula (2) is effective when only the link between the linguistic expressions having an important relationship is important.
- the correlation value calculation unit 4 can also calculate a correlation value R between language expressions using the following equation (5).
- F (N) in the above equation (5) is a function similar to f (N) used in the above equation (1).
- ⁇ is a constant correction value that is not 0 (zero).
- H (P) is the entropy of the probability distribution P of links between linguistic expressions, and can be obtained by the following equation (6).
- the number of occurrences N m of links found at m th interval for example, if the two occurrences time is specified for one link, paying attention to earlier appearance time, the m-th segment It can be determined by counting the number of links that appear within.
- the representative time such as the intermediate time between these two appearance times is determined, instead of paying attention to the earlier appearance time, the number of links can be counted focusing on the representative time. good.
- the above formula (5) is effective when obtaining a correlation value for each of the language expression A and the language expression B, and further obtaining the correlation between the language expression A and the language expression B.
- the language expression A and the language expression B are the same in the variance calculated from the number of links and the time intervals between all links.
- the bias of the distribution of the links is taken into account in calculating the correlation value between the respective language expressions, the correlation between the language expressions can be accurately obtained.
- a weight S is used as a later-described similarity S calculated based on the semantic similarity between electronic documents related to each link.
- the correlation value R ′ can be calculated using the following equation (7).
- R is the correlation value R of the above equations (1), (2), and (5).
- the similarity S in the above equation (7) can be calculated using, for example, the following equation (8).
- the above equation (8) is an additive for the result of calculating the semantic similarity function sim (DX n , DY n ) between electronic documents related to the n-th link between the language expression X and the language expression Y. Represents the average.
- the similarity function sim (DX n , DY n ) can be calculated based on the vector space model. For example, let DX n be the feature vector of the document related to the language expression X in the nth link, and let DY n be the feature vector of the document related to the language expression Y in the nth link. In this case, the similarity function sim (DX n , DY n ) can be calculated by using the cosine of the angle formed by each feature vector.
- a word in the electronic document including the language expression X can be adopted as a component of the feature vector DX n
- a word in the electronic document including the language expression Y is used as a component of the feature vector DY n.
- the feature vector DX n and the feature vector DY n are weighted by the appearance frequency tf (Term Frequency) of each word in the electronic document and the reciprocal number idf (inverse document frequency) of the appearance frequency in all target electronic documents.
- the vector element may have the value used as.
- the similarity function sim (DX n , DY n ) can be calculated using a known similarity measure between documents.
- the calculation method of the above formula (8) and the similarity function sim (DX n , DY n ) is not limited to the method described above.
- the similarity may be calculated in the mth section and used in the above equations (4) and (6).
- the similarity S m in the m-th section can be calculated using the following equation (9).
- N m in the above equation (9) is the number of appearances of links appearing in the m-th section.
- k represents the order of links appearing in the mth section.
- the similarity function sim (DX k , DY k ) may be calculated in the same manner as the above equation (8).
- equation (9) when applying the above equation (9) into the equation (4) may be, for example, calculated by the equation (10) below dq m in the formula (4).
- FIG. 5 is a flowchart showing the flow of processing in the information analysis method according to Embodiment 1 of the present invention.
- the information analysis method according to the first embodiment is implemented by operating the information analysis apparatus 1 according to the first embodiment shown in FIG. For this reason, the following description will be described together with the operation of the information analysis apparatus 1 with appropriate reference to FIG.
- the input unit 2 receives an input of a plurality of language expressions to be analyzed (step A1).
- the input unit 2 receives input of the language expression X and the language expression Y and inputs them to the link information generation unit 3.
- the link information generation unit 3 accesses the document storage unit 11 of the storage device 10 and searches for an electronic document including the input language expression (step A2).
- an electronic document including at least one of the language expression X and the language expression Y is searched.
- the link information generation unit 3 extracts time information possessed by each of the plurality of electronic documents and a relationship between the electronic documents in the plurality of electronic documents from the plurality of electronic documents specified by the search. (Step A3).
- a relationship between electronic documents a reference ID (see FIG. 2) set in advance is extracted from a reference relationship between electronic documents.
- the link information generation unit 3 detects a link between language expressions based on the time information extracted in step A3 and the relationship between the electronic documents, and generates link information (step A4).
- the link between the language expression X and the language expression Y is detected, and the appearance time of the link is also detected.
- link information specifying the detected link and its appearance time is generated (see FIGS. 3 and 4).
- the link information generation unit 3 inputs link information to the correlation value calculation unit 4.
- the correlation value calculation unit 4 specifies the number of appearances of links between language expressions from the link information generated in step A4, and calculates the correlation value R using the number of appearances of links (step A5).
- the correlation value is calculated using the appearance time of each link in addition to the number of appearances of the link.
- the correlation value R is calculated using any one or a combination of the above-described formulas (1), (2), (5), and (7).
- the correlation value calculation unit 4 outputs the calculated correlation value R to the output device 13
- the processing in the information analysis device 1 ends.
- the information analysis apparatus 1 is in a stopped state until the next language expression is input.
- the link between the language expressions is detected in consideration of not only the time information related to the language expression but also the relationship between the documents including each language expression. Then, the correlation value is calculated using such a link. Therefore, the reliability of the correlation value becomes high, and a situation in which an error occurs in the determination of the correlation due to an accidental cause is avoided.
- the information analysis apparatus 1 can be realized by installing a program that can execute steps A1 to A5 shown in FIG. 5 in a computer and executing the program. This point will be described with reference to FIG.
- FIG. 6 is a diagram showing a computer device capable of realizing the information analysis device 1 shown in FIG.
- a computer device 20 includes a CPU (central processing unit) 21, a RAM (Random Access Memory) 22, a ROM (Read Only Memory) 23, an interface circuit (I / F) 24, a magneto-optical disk storage device. (Hard disk) 25, reading device 26, and video card 27 are provided.
- the interface circuit 24 functions as the input unit 2.
- a keyboard 28 is used as an input device (see FIG. 1), and the keyboard 28 is connected to the interface circuit 24.
- a display device 29 is used as the output device (see FIG. 1), and the display device 29 is connected to the video card 27.
- a part of the storage area of the magnetic disk storage device 25 is used as the document storage unit 11 (see FIG. 1). A large number of electronic documents are stored in this partial storage area. Further, a program for causing the computer device 20 to execute steps A1 to A5 shown in FIG.
- the program stored in the recording medium 30 is installed in the computer device 20 via the reading device 26 constituted by an optical disk device or the like.
- the CPU 21 functions as the link information generation unit 3 and the correlation value calculation unit 4, and the information analysis apparatus 1 is realized.
- the document storage unit 11 (see FIG. 1) can also be realized by mounting a recording medium in which a large number of electronic documents are stored in the reading device 26. Further, the document storage unit 11 may be constructed in another computer device connected to the computer device 20 via a network.
- FIG. 7 is a block diagram showing a schematic configuration of the information analysis apparatus according to Embodiment 2 of the present invention.
- the information analysis device 5 in the second embodiment includes a language expression generation unit 6, and in this respect, differs from the information analysis device 1 in the first embodiment (see FIG. 1). Yes.
- the difference between the information analysis apparatus 5 in the second embodiment and the information analysis apparatus 1 in the first embodiment will be described more specifically.
- the input unit 2 accepts input of one language expression among a pair of language expressions to be analyzed. Then, the input unit 2 and the language expression that has received the input are input to the language expression generation unit 6 in addition to the link information generation unit 3.
- the input language expression is referred to as “input language expression”.
- the language expression generation unit 6 generates another language expression related to the input language expression (hereinafter referred to as “related language expression”). In the present embodiment, the language expression generation unit 6 generates one related language expression for one input language expression. Further, the language expression generation unit 6 inputs the generated related language expression to the link information generation unit 3.
- the input language expression is the language expression X “An earthquake-resistant gel is effective”.
- the linguistic expression generation unit 6 adds the negative expression “not” to the linguistic expression X “the seismic gel is valid”, adjusts the utilization form, and relates to the related language “the seismic gel is not valid”. An expression can be generated.
- the related language expression is not limited to the above example, and the language expression generation unit 6 can also generate, for example, a language expression that opposes the input language expression as the related language expression. Furthermore, the language expression generation unit 6 can extract a language expression that co-occurs with the input language expression from the electronic document stored in the document storage unit 11, and can also use the extracted language expression as a related language expression. .
- the link information generation unit 3 When the input language expression and the related language expression are input, the link information generation unit 3 generates link information for these. That is, the link information generation unit 3 first extracts time information possessed by the electronic document and a relationship between the electronic documents from the electronic document including the input language expression and the related language expression. Subsequently, the link information generation unit 3 detects a link between the input language expression and the related language expression based on the time information and the relationship between the electronic documents, and link information for specifying the detected link Is generated.
- the correlation value calculation unit 4 specifies the number of appearances of links between the input language expression and the related language expression and the appearance time of each link, and the number of occurrences of the specified link and each link Is used to calculate a correlation value between the input language expression and the related language expression in accordance with the degree of continuous appearance of the link.
- the information analysis device 5 includes the language expression generation unit 6, and the link information generation unit 3 and the correlation value calculation unit 4 thereby process the related language expression as a processing target. This is different from the information analysis apparatus 1 in the first embodiment. Except for these points, the information analysis device 5 is configured in the same manner as the information analysis device 1.
- FIG. 8 is a flowchart showing the flow of processing in the information analysis method according to Embodiment 2 of the present invention.
- the information analysis method in the second embodiment is implemented by operating the information analysis apparatus 5 in the second embodiment shown in FIG. For this reason, the following description will be described together with the operation of the information analysis apparatus 5 with appropriate reference to FIG.
- the input unit 2 accepts input of one of the language expressions to be analyzed (step B1).
- the language expression (input language expression) for which the input is accepted is input to the link information generation unit 3 and the language expression generation unit 6.
- the number of language expressions that can be accepted is not limited to one, and may be two or more.
- the correlation value is not calculated for the input language expressions even in the case of two or more cases.
- the language expression generation unit 6 generates a related language expression based on the input language expression (step B2).
- the generated related language expression is input to the link information generation unit 3.
- the link information generation unit 3 accesses the document storage unit 11 of the storage device 10 and includes at least one of the input language expression and the related language expression. Is searched (step B3).
- an electronic document including at least one of the language expression X and the language expression Y is searched.
- the link information generation unit 3 extracts time information possessed by each of the plurality of electronic documents and a relationship between the electronic documents in the plurality of electronic documents from the plurality of electronic documents specified by the search. (Step B4).
- the link information generation unit 3 detects the link between the input language expression and the related language expression and the appearance time of the link based on the time information extracted in step B4 and the relationship between the electronic documents. Information is generated (step B5).
- the correlation value calculation unit 4 specifies the number of appearances of the link between the input language expression and the related language expression and the appearance time of each link from the link information generated in Step B4, Using the appearance time of each link, the correlation value R is calculated according to the degree of continuous appearance of the link (step B6). Also in the second embodiment, the correlation value R is calculated by any one of the formulas (1), (2), (5), and (7) shown in the first embodiment, or some combination thereof. It is done using.
- the correlation value calculation unit 4 outputs the calculated correlation value R to the output device 13
- the processing in the information analysis device 5 ends.
- the information analysis device 5 is in a stopped state until the next language expression is input.
- the second embodiment when a linguistic expression is input, a linguistic expression related to the linguistic expression is generated, and a correlation value between them is calculated.
- the second embodiment is effective when it is desired to acquire a keyword related to a keyword that is attracting attention. Also in the second embodiment, as in the first embodiment, the reliability of the correlation value is high, and a situation in which an error occurs in the determination of the correlation due to an accidental cause is avoided.
- the present invention can be applied to uses such as an information search apparatus using time series relation as a search condition and an information classification apparatus using determination result of time series relation.
- the present invention can also be applied to uses such as a text mining device for the purpose of finding information related to the language expression to be analyzed.
- Information analyzer (Embodiment 1) 2 Input unit 3 Link information generation unit 4 Correlation value calculation unit 5 Information analyzer (Embodiment 2) 6 Language Expression Generating Unit 10 Storage Device 11 Document Storage Unit 12 Input Device 13 Output Device 20 Computer Device 21 CPU 22 RAM 23 ROM 24 interface circuit 25 magnetic disk storage device 26 reading device 27 video card 28 keyboard 29 display device 30 recording medium
Abstract
Description
本願は、2008年9月25日に、日本に出願された特願2008-245162号に基づき優先権を主張し、その内容をここに援用する。
リンク情報生成部と、相関値算出部とを備え、
前記リンク情報生成部は、前記複数の言語表現のいずれかを少なくとも含む複数の電子文書から、前記複数の電子文書それぞれが有する時間情報と、前記複数の電子文書における電子文書間の関係とを抽出し、更に、抽出した前記時間情報と前記電子文書間の関係とに基づいて、前記複数の言語表現における一の言語表現と他の一の言語表現との間のリンク及び前記リンクの出現時間を検出し、そして、検出された前記リンク及び前記リンクの出現時間を特定するリンク情報を生成し、
前記相関値算出部は、前記リンク情報から、前記一の言語表現と前記他の一の言語表現との間のリンクの出現数及び各リンクの出現時間を特定し、特定された前記リンクの出現数及び前記各リンクの出現時間を用いて、前記一の言語表現と前記他の一の言語表現との間の相関値を前記リンクが継続して出現する度合いに応じて算出する、ことを特徴とする。
(a)前記複数の言語表現のいずれかを少なくとも含む複数の電子文書から、前記複数の電子文書それぞれが有する時間情報と、前記複数の電子文書における電子文書間の関係とを抽出するステップと、
(b)前記(a)のステップで抽出した前記時間情報と前記電子文書間の関係とに基づいて、前記複数の言語表現における一の言語表現と他の一の言語表現との間のリンク及び前記リンクの出現時間を検出し、検出された前記リンク及び前記リンクの出現時間を特定するリンク情報を生成するステップと、
(c)前記(b)のステップで生成された前記リンク情報から、前記一の言語表現と前記他の一の言語表現との間のリンクの出現数及び各リンクの出現時間を特定し、特定された前記リンクの出現数及び前記各リンクの出現時間を用いて、前記一の言語表現と前記他の一の言語表現との間の相関値を前記リンクが継続して出現する度合いに応じて算出するステップとを有する、ことを特徴とする。
複数の言語表現を分析対象とする情報分析をコンピュータに実行させるためのプログラムであって、
前記コンピュータに、
(a)前記複数の言語表現のいずれかを少なくとも含む複数の電子文書から、前記複数の電子文書それぞれが有する時間情報と、前記複数の電子文書における電子文書間の関係とを抽出するステップと、
(b)前記(a)のステップで抽出した前記時間情報と前記電子文書間の関係とに基づいて、前記複数の言語表現における一の言語表現と他の一の言語表現との間のリンク及び前記リンクの出現時間を検出し、検出された前記リンク及び前記リンクの出現時間を特定するリンク情報を生成するステップと、
(c)前記(b)のステップで生成された前記リンク情報から、前記一の言語表現と前記他の一の言語表現との間のリンクの出現数及び各リンクの出現時間を特定し、特定された前記リンクの出現数及び前記各リンクの出現時間を用いて、前記一の言語表現と前記他の一の言語表現との間の相関値を前記リンクが継続して出現する度合いに応じて算出するステップとを実行させる、ことを特徴とする。
以下、本発明の実施の形態1における情報分析装置、情報分析方法、及びプログラムについて、図1~図6を参照しながら説明する。最初に、本実施の形態1における情報分析装置の構成について図1~図4を用いて説明する。図1は、本発明の実施の形態1における情報分析装置の概略構成を示すブロック図である。図2は、図1に示す記憶装置に格納されている情報の一例を示す図である。図3は、本発明の実施の形態1で生成されるリンク情報の一例を示す図である。図4は、本発明の実施の形態1で生成されるリンク情報の他の例を示す図である。
次に本発明の実施の形態2における情報分析装置、情報分析方法、及びプログラムについて、図7及び図8を参照しながら説明する。最初に、図7を用いて、本発明の実施の形態2における情報分析装置の構成について説明する。図7は、本発明の実施の形態2における情報分析装置の概略構成を示すブロック図である。
2 入力部
3 リンク情報生成部
4 相関値算出部
5 情報分析装置(実施の形態2)
6 言語表現生成部
10 記憶装置
11 文書記憶部
12 入力装置
13 出力装置
20 コンピュータ装置
21 CPU
22 RAM
23 ROM
24 インターフェイス回路
25 磁気ディスク記憶装置
26 読取装置
27 ビデオカード
28 キーボード
29 表示装置
30 記録媒体
Claims (21)
- 複数の言語表現を分析対象とする情報分析装置であって、
リンク情報生成部と、相関値算出部とを備え、
前記リンク情報生成部は、前記複数の言語表現のいずれかを少なくとも含む複数の電子文書から、前記複数の電子文書それぞれが有する時間情報と、前記複数の電子文書における電子文書間の関係とを抽出し、更に、抽出した前記時間情報と前記電子文書間の関係とに基づいて、前記複数の言語表現における一の言語表現と他の一の言語表現との間のリンク及び前記リンクの出現時間を検出し、そして、検出された前記リンク及び前記リンクの出現時間を特定するリンク情報を生成し、
前記相関値算出部は、前記リンク情報から、前記一の言語表現と前記他の一の言語表現との間のリンクの出現数及び各リンクの出現時間を特定し、特定された前記リンクの出現数及び前記各リンクの出現時間を用いて、前記一の言語表現と前記他の一の言語表現との間の相関値を前記リンクが継続して出現する度合いに応じて算出する、ことを特徴とする情報分析装置。 - 前記リンク情報生成部が、前記複数の電子文書における電子文書間の関係として、前記複数の電子文書における一の電子文書と他の一の電子文書との参照関係を抽出する、請求項1に記載の情報分析装置。
- 前記リンク情報生成部が、前記複数の電子文書における電子文書間の関係として、前記複数の電子文書における一の電子文書と他の一の電子文書との類似関係を抽出する、請求項1に記載の情報分析装置。
- 前記リンク情報生成部が、前記複数の電子文書における電子文書間の関係として、前記複数の電子文書における一の電子文書と他の一の電子文書との対立関係を抽出する、請求項1に記載の情報分析装置。
- 前記リンク情報生成部が、前記一の言語表現を含む電子文書の時間情報に含まれる時間と、前記他の一の言語表現を含む電子文書の時間情報に含まれる時間とのうち、いずれか一方又は両方を前記リンクの出現時間とする、請求項1から4のいずれかに記載の情報分析装置。
- 前記リンク情報生成部が、前記一の言語表現を含む電子文書の時間情報に含まれる時間と、前記他の一の言語表現を含む電子文書の時間情報に含まれる時間との中間の時間を求め、求めた前記中間の時間を前記リンクの出現時間とする、請求項1から5のいずれかに記載の情報分析装置。
- 分析対象となる第1の言語表現の入力を受け付ける入力部と、
前記第1の言語表現に関連する第2の言語表現を生成する言語表現生成部とを、更に、備え、
前記リンク情報生成部が、前記第1の言語表現及び前記第2の言語表現のいずれか一方を少なくとも含む電子文書から、前記電子文書が有する時間情報と、前記電子文書間の関係とを抽出し、更に、抽出した前記時間情報と前記電子文書間の関係とに基づいて、前記第1の言語表現と前記第2の言語表現との間のリンク及び前記リンクの出現時間を検出し、そして、検出された前記リンク及び前記リンクの出現時間を特定するリンク情報を生成し、
前記相関値算出部は、前記リンク情報から、前記第1の言語表現と前記第2の言語表現との間のリンクの出現数及び各リンクの出現時間を特定し、特定された前記リンクの出現数及び前記各リンクの出現時間を用いて、前記第1の言語表現と前記第2の言語表現との間の相関値を前記リンクが継続して出現する度合いに応じて算出する、請求項1~6のいずれかに記載の情報分析装置。 - 複数の言語表現を分析対象とする情報分析方法であって、
(a)前記複数の言語表現のいずれかを少なくとも含む複数の電子文書から、前記複数の電子文書それぞれが有する時間情報と、前記複数の電子文書における電子文書間の関係とを抽出するステップと、
(b)前記(a)のステップで抽出した前記時間情報と前記電子文書間の関係とに基づいて、前記複数の言語表現における一の言語表現と他の一の言語表現との間のリンク及び前記リンクの出現時間を検出し、検出された前記リンク及び前記リンクの出現時間を特定するリンク情報を生成するステップと、
(c)前記(b)のステップで生成された前記リンク情報から、前記一の言語表現と前記他の一の言語表現との間のリンクの出現数及び各リンクの出現時間を特定し、特定された前記リンクの出現数及び前記各リンクの出現時間を用いて、前記一の言語表現と前記他の一の言語表現との間の相関値を前記リンクが継続して出現する度合いに応じて算出するステップとを有する、ことを特徴とする情報分析方法。 - 前記(a)のステップにおいて、前記複数の電子文書における電子文書間の関係として、前記複数の電子文書における一の電子文書と他の一の電子文書との参照関係を抽出する、請求項8に記載の情報分析方法。
- 前記(a)のステップにおいて、前記複数の電子文書における電子文書間の関係として、前記複数の電子文書における一の電子文書と他の一の電子文書との類似関係を抽出する、請求項8に記載の情報分析方法。
- 前記(a)のステップにおいて、前記複数の電子文書における電子文書間の関係として、前記複数の電子文書における一の電子文書と他の一の電子文書との対立関係を抽出する、請求項8に記載の情報分析方法。
- 前記(b)のステップにおいて、前記一の言語表現を含む電子文書の時間情報に含まれる時間と、前記他の一の言語表現を含む電子文書の時間情報に含まれる時間とのうち、いずれか一方又は両方を前記リンクの出現時間とする、請求項8~11のいずれかに記載の情報分析方法。
- 前記(b)のステップにおいて、前記一の言語表現を含む電子文書の時間情報に含まれる時間と、前記他の一の言語表現を含む電子文書の時間情報に含まれる時間との中間の時間を求め、求めた前記中間の時間を前記リンクの出現時間とする、請求項8~11のいずれかに記載の情報分析方法。
- (d)分析対象となる第1の言語表現の入力を受け付けるステップと、
(e)前記第1の言語表現に関連する第2の言語表現を生成するステップとを更に有し、
前記(d)のステップ及び前記(e)のステップは、前記(a)のステップの実行前に実行され、
前記(a)のステップにおいて、前記第1の言語表現及び前記第2の言語表現のいずれか一方を少なくとも含む電子文書から、前記電子文書が有する時間情報と、前記電子文書間の関係とを抽出し、
前記(b)のステップにおいて、前記(a)のステップで抽出した前記時間情報と前記電子文書間の関係とに基づいて、前記第1の言語表現と前記第2の言語表現との間のリンク及び前記リンクの出現時間を検出し、そして、検出された前記リンク及び前記リンクの出現時間を特定するリンク情報を生成し、
前記(c)のステップにおいて、前記(b)のステップで生成された前記リンク情報から、前記第1の言語表現と前記第2の言語表現との間のリンクの出現数及び各リンクの出現時間を特定し、特定された前記リンクの出現数及び前記各リンクの出現時間を用いて、前記第1の言語表現と前記第2の言語表現との間の相関値を前記リンクが継続して出現する度合いに応じて算出する、請求項8~13のいずれかに記載の情報分析方法。 - 複数の言語表現を分析対象とする情報分析をコンピュータに実行させるためのプログラムであって、
前記コンピュータに、
(a)前記複数の言語表現のいずれかを少なくとも含む複数の電子文書から、前記複数の電子文書それぞれが有する時間情報と、前記複数の電子文書における電子文書間の関係とを抽出するステップと、
(b)前記(a)のステップで抽出した前記時間情報と前記電子文書間の関係とに基づいて、前記複数の言語表現における一の言語表現と他の一の言語表現との間のリンク及び前記リンクの出現時間を検出し、検出された前記リンク及び前記リンクの出現時間を特定するリンク情報を生成するステップと、
(c)前記(b)のステップで生成された前記リンク情報から、前記一の言語表現と前記他の一の言語表現との間のリンクの出現数及び各リンクの出現時間を特定し、特定された前記リンクの出現数及び前記各リンクの出現時間を用いて、前記一の言語表現と前記他の一の言語表現との間の相関値を前記リンクが継続して出現する度合いに応じて算出するステップとを実行させる、ことを特徴とするプログラム。 - 前記(a)のステップにおいて、前記複数の電子文書における電子文書間の関係として、前記複数の電子文書における一の電子文書と他の一の電子文書との参照関係を抽出する、請求項15に記載のプログラム。
- 前記(a)のステップにおいて、前記複数の電子文書における電子文書間の関係として、前記複数の電子文書における一の電子文書と他の一の電子文書との類似関係を抽出する、請求項15に記載のプログラム。
- 前記(a)のステップにおいて、前記複数の電子文書における電子文書間の関係として、前記複数の電子文書における一の電子文書と他の一の電子文書との対立関係を抽出する、請求項15に記載のプログラム。
- 前記(b)のステップにおいて、前記一の言語表現を含む電子文書の時間情報に含まれる時間と、前記他の一の言語表現を含む電子文書の時間情報に含まれる時間とのうち、いずれか一方又は両方を前記リンクの出現時間とする、請求項15~18のいずれかに記載のプログラム。
- 前記(b)のステップにおいて、前記一の言語表現を含む電子文書の時間情報に含まれる時間と、前記他の一の言語表現を含む電子文書の時間情報に含まれる時間との中間の時間を求め、求めた前記中間の時間を前記リンクの出現時間とする、請求項15~18のいずれかに記載のプログラム。
- (d)分析対象となる第1の言語表現の入力を受け付けるステップと、
(e)前記第1の言語表現に関連する第2の言語表現を生成するステップとを、前記(a)のステップの実行前に、更に前記コンピュータに実行させ、
前記(a)のステップにおいて、前記第1の言語表現及び前記第2の言語表現のいずれか一方を少なくとも含む電子文書から、前記電子文書が有する時間情報と、前記電子文書間の関係とを抽出し、
前記(b)のステップにおいて、前記(a)のステップで抽出した前記時間情報と前記電子文書間の関係とに基づいて、前記第1の言語表現と前記第2の言語表現との間のリンク及び前記リンクの出現時間を検出し、そして、検出された前記リンク及び前記リンクの出現時間を特定するリンク情報を生成し、
前記(c)のステップにおいて、前記(b)のステップで生成された前記リンク情報から、前記第1の言語表現と前記第2の言語表現との間のリンクの出現数及び前記各リンクの出現時間を特定し、特定された前記リンクの出現数及び前記各リンクの出現時間を用いて、前記第1の言語表現と前記第2の言語表現との間の相関値を前記リンクが継続して出現する度合いに応じて算出する、請求項15~20のいずれかに記載のプログラム。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US13/057,842 US8612202B2 (en) | 2008-09-25 | 2009-09-04 | Correlation of linguistic expressions in electronic documents with time information |
JP2010530706A JP5387577B2 (ja) | 2008-09-25 | 2009-09-04 | 情報分析装置、情報分析方法、及びプログラム |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2008-245162 | 2008-09-25 | ||
JP2008245162 | 2008-09-25 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2010035412A1 true WO2010035412A1 (ja) | 2010-04-01 |
Family
ID=42059426
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2009/004399 WO2010035412A1 (ja) | 2008-09-25 | 2009-09-04 | 情報分析装置、情報分析方法、及びプログラム |
Country Status (3)
Country | Link |
---|---|
US (1) | US8612202B2 (ja) |
JP (1) | JP5387577B2 (ja) |
WO (1) | WO2010035412A1 (ja) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2015072085A1 (ja) * | 2013-11-12 | 2015-05-21 | 日本電気株式会社 | ログ分析システム、ログ分析方法、および、記憶媒体 |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP6394388B2 (ja) | 2012-03-30 | 2018-09-26 | 日本電気株式会社 | 同義関係判定装置、同義関係判定方法、及びそのプログラム |
US9313284B2 (en) | 2013-03-14 | 2016-04-12 | International Business Machines Corporation | Smart posting with data analytics and semantic analysis to improve a message posted to a social media service |
JP6326786B2 (ja) * | 2013-11-29 | 2018-05-23 | ブラザー工業株式会社 | プログラム、情報処理装置、および通信システム |
JP6842167B2 (ja) * | 2017-05-08 | 2021-03-17 | 国立研究開発法人情報通信研究機構 | 要約生成装置、要約生成方法及びコンピュータプログラム |
JP7100797B2 (ja) * | 2017-12-28 | 2022-07-14 | コニカミノルタ株式会社 | 文書スコアリング装置、プログラム |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH10320419A (ja) * | 1997-05-22 | 1998-12-04 | Nippon Telegr & Teleph Corp <Ntt> | 情報関連づけ装置およびその方法 |
JPH11312168A (ja) * | 1998-04-28 | 1999-11-09 | Nippon Telegr & Teleph Corp <Ntt> | 同義語計算装置及びその方法並びに同義語計算プログラムを記録した媒体 |
JP2006039811A (ja) * | 2004-07-26 | 2006-02-09 | Fuji Xerox Co Ltd | ドキュメント管理プログラム、ドキュメント管理方法、及びドキュメント管理装置 |
JP2007079730A (ja) * | 2005-09-12 | 2007-03-29 | Oki Electric Ind Co Ltd | 単語類似判断装置、方法及びプログラム |
JP2008152634A (ja) * | 2006-12-19 | 2008-07-03 | Nippon Telegr & Teleph Corp <Ntt> | 潜在話題抽出装置、潜在話題抽出方法、プログラムおよび記録媒体 |
Family Cites Families (26)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH0713598A (ja) | 1993-06-24 | 1995-01-17 | Osaka Gas Co Ltd | 特定タスク音声データベース生成装置 |
JPH09153050A (ja) * | 1995-11-29 | 1997-06-10 | Hitachi Ltd | 文書情報収集方法および文書情報収集装置 |
JPH10143371A (ja) | 1996-11-13 | 1998-05-29 | Mitsubishi Electric Corp | 事例検索システム及び事例検索方法 |
JP3634099B2 (ja) * | 1997-02-17 | 2005-03-30 | 株式会社リコー | 文書情報管理システム,媒体用紙情報作成装置および文書情報管理装置 |
US6782393B1 (en) * | 2000-05-31 | 2004-08-24 | Ricoh Co., Ltd. | Method and system for electronic message composition with relevant documents |
JP2004139553A (ja) * | 2002-08-19 | 2004-05-13 | Matsushita Electric Ind Co Ltd | 文書検索システムおよび質問応答システム |
JP3600611B2 (ja) * | 2002-12-12 | 2004-12-15 | 本田技研工業株式会社 | 情報処理装置および情報処理方法、並びに情報処理プログラム |
TW200512599A (en) * | 2003-09-26 | 2005-04-01 | Avectec Com Inc | Method for keyword correlation analysis |
US8131702B1 (en) * | 2004-03-31 | 2012-03-06 | Google Inc. | Systems and methods for browsing historical content |
US8335785B2 (en) * | 2004-09-28 | 2012-12-18 | Hewlett-Packard Development Company, L.P. | Ranking results for network search query |
JP4466334B2 (ja) | 2004-11-08 | 2010-05-26 | 日本電信電話株式会社 | 情報分類方法及び装置及びプログラム及びプログラムを格納した記憶媒体 |
JP2006164045A (ja) | 2004-12-09 | 2006-06-22 | Nippon Telegr & Teleph Corp <Ntt> | 共起グラフ作成方法及び装置及びプログラム及びプログラムを格納した記憶媒体 |
US8438142B2 (en) * | 2005-05-04 | 2013-05-07 | Google Inc. | Suggesting and refining user input based on original user input |
US7739254B1 (en) * | 2005-09-30 | 2010-06-15 | Google Inc. | Labeling events in historic news |
JP4806644B2 (ja) * | 2007-03-15 | 2011-11-02 | 富士通株式会社 | ジャンプ先サイト決定プログラム、記録媒体、ジャンプ先サイト決定方法、およびジャンプ先サイト決定装置 |
KR100881832B1 (ko) * | 2007-03-30 | 2009-02-03 | 엔에이치엔(주) | 최적의 랜딩 페이지 검색을 통한 키워드 광고 노출 방법 및시스템 |
US8521674B2 (en) * | 2007-04-27 | 2013-08-27 | Nec Corporation | Information analysis system, information analysis method, and information analysis program |
US8290921B2 (en) * | 2007-06-28 | 2012-10-16 | Microsoft Corporation | Identification of similar queries based on overall and partial similarity of time series |
US8037086B1 (en) * | 2007-07-10 | 2011-10-11 | Google Inc. | Identifying common co-occurring elements in lists |
US8442969B2 (en) * | 2007-08-14 | 2013-05-14 | John Nicholas Gross | Location based news and search engine |
US7962437B2 (en) * | 2007-11-16 | 2011-06-14 | International Business Machines Corporation | Data comparison using different time periods in data sequences |
US7809721B2 (en) * | 2007-11-16 | 2010-10-05 | Iac Search & Media, Inc. | Ranking of objects using semantic and nonsemantic features in a system and method for conducting a search |
US20100318526A1 (en) * | 2008-01-30 | 2010-12-16 | Satoshi Nakazawa | Information analysis device, search system, information analysis method, and information analysis program |
JP5136910B2 (ja) * | 2008-01-30 | 2013-02-06 | 日本電気株式会社 | 情報分析装置、情報分析方法、情報分析用プログラム、及び検索システム |
JP5224868B2 (ja) * | 2008-03-28 | 2013-07-03 | 株式会社東芝 | 情報推薦装置および情報推薦方法 |
US8407214B2 (en) * | 2008-06-25 | 2013-03-26 | Microsoft Corp. | Constructing a classifier for classifying queries |
-
2009
- 2009-09-04 US US13/057,842 patent/US8612202B2/en active Active
- 2009-09-04 JP JP2010530706A patent/JP5387577B2/ja active Active
- 2009-09-04 WO PCT/JP2009/004399 patent/WO2010035412A1/ja active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH10320419A (ja) * | 1997-05-22 | 1998-12-04 | Nippon Telegr & Teleph Corp <Ntt> | 情報関連づけ装置およびその方法 |
JPH11312168A (ja) * | 1998-04-28 | 1999-11-09 | Nippon Telegr & Teleph Corp <Ntt> | 同義語計算装置及びその方法並びに同義語計算プログラムを記録した媒体 |
JP2006039811A (ja) * | 2004-07-26 | 2006-02-09 | Fuji Xerox Co Ltd | ドキュメント管理プログラム、ドキュメント管理方法、及びドキュメント管理装置 |
JP2007079730A (ja) * | 2005-09-12 | 2007-03-29 | Oki Electric Ind Co Ltd | 単語類似判断装置、方法及びプログラム |
JP2008152634A (ja) * | 2006-12-19 | 2008-07-03 | Nippon Telegr & Teleph Corp <Ntt> | 潜在話題抽出装置、潜在話題抽出方法、プログラムおよび記録媒体 |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2015072085A1 (ja) * | 2013-11-12 | 2015-05-21 | 日本電気株式会社 | ログ分析システム、ログ分析方法、および、記憶媒体 |
JPWO2015072085A1 (ja) * | 2013-11-12 | 2017-03-16 | 日本電気株式会社 | ログ分析システム、ログ分析方法、および、プログラム |
Also Published As
Publication number | Publication date |
---|---|
JPWO2010035412A1 (ja) | 2012-02-16 |
US8612202B2 (en) | 2013-12-17 |
US20110137641A1 (en) | 2011-06-09 |
JP5387577B2 (ja) | 2014-01-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10567329B2 (en) | Methods and apparatus for inserting content into conversations in on-line and digital environments | |
Hassan et al. | Beyond DCG: user behavior as a predictor of a successful search | |
CN108319630A (zh) | 信息处理方法、装置、存储介质和计算机设备 | |
US8402035B2 (en) | Methods and systems for determing media value | |
WO2009096523A1 (ja) | 情報分析装置、検索システム、情報分析方法及び情報分析用プログラム | |
US8825571B1 (en) | Multiple correlation measures for measuring query similarity | |
JP5387577B2 (ja) | 情報分析装置、情報分析方法、及びプログラム | |
JP5136910B2 (ja) | 情報分析装置、情報分析方法、情報分析用プログラム、及び検索システム | |
JP4293145B2 (ja) | クチコミ情報判定方法及び装置及びプログラム | |
De Nies et al. | Bringing Newsworthiness into the 21st Century. | |
JP2007264718A (ja) | ユーザ興味分析装置、方法、プログラム | |
JP5427694B2 (ja) | 関連コンテンツ提示装置及びプログラム | |
Figueira et al. | Detecting Journalistic Relevance on Social Media: A two-case study using automatic surrogate features | |
US20180196794A1 (en) | Server and method for providing content based on context information | |
JP4143085B2 (ja) | 同義語獲得方法及び装置及びプログラム及びコンピュータ読み取り可能な記録媒体 | |
JP2009223372A (ja) | リコメンド装置、リコメンドシステム、リコメンド装置の制御方法、およびリコメンドシステムの制御方法 | |
CN111177514B (zh) | 基于网站特征分析的信源评价方法、装置及存储设备、程序 | |
JP2006202118A (ja) | 属性評価装置、属性評価方法および属性評価プログラム | |
Guthrie et al. | An unsupervised approach for the detection of outliers in corpora | |
JP2011081626A (ja) | 辞書登録装置、文書ラベル判定システムおよび辞書登録プログラム | |
US20100287136A1 (en) | Method and system for the recognition and tracking of entities as they become famous | |
Zammit et al. | Exposing knowledge: providing a real-time view of the domain under study for students | |
Kejriwal et al. | Empirical best practices on using product-specific schema. org | |
JP5123057B2 (ja) | スパム判定方法及び装置及びプログラム | |
Dalal et al. | Aspect term extraction from customer reviews using conditional random fields |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 09815834 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 13057842 Country of ref document: US |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2010530706 Country of ref document: JP |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 09815834 Country of ref document: EP Kind code of ref document: A1 |