WO2002071277A1 - Document and information retrieval method and apparatus - Google Patents
Document and information retrieval method and apparatus Download PDFInfo
- Publication number
- WO2002071277A1 WO2002071277A1 PCT/US2002/006053 US0206053W WO02071277A1 WO 2002071277 A1 WO2002071277 A1 WO 2002071277A1 US 0206053 W US0206053 W US 0206053W WO 02071277 A1 WO02071277 A1 WO 02071277A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- query
- index
- vector
- item
- vectors
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/3332—Query translation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3347—Query execution using vector based model
Definitions
- the present invention relates to document and/or information retrieval in which document and/or information relevant to an input query is retrieved and more particularly to a retrieval method and apparatus wherein an input query including plural terms related to each other by Boolean logic are transformed into vector form.
- Boolean model the extended Boolean model, the fuzzy set model, the vector space model, the probabilistic model, and the network model are prior art examples of models for information retrieval technology, .
- These prior art information retrieval models are detailed in Takenobu Tokunaga: “Information Retrieval and Language Processing” (University of Tokyo Press, 1999) and Ricardo Baeza-Yates and Berthier Ribeiro-Neto: “Modern Information Retrieval” (Addison-Wesley, 1999) .
- the Boolean model is the most classic and fundamental model, while the vector space model is the most popular model.
- relevant documents are located by logically collating terms of an input query, described by a Boolean logic operator, such as AND, OR or NOT, with query terms associated with each document so as to locate relevant documents.
- a vector component describes and corresponds to each term of an input query.
- the values of the vector components associated with the input query are set at " one.
- Each document is described by a document vector having a value of one or zero according to the presence or absence of the corresponding query term in the document.
- the component values are often weighted.
- the similarity of the query vector with a document vector is measured to indicate the degree of relevance between the query and the document.
- the vector space model is generally considered superior to the Boolean model in:
- the vector space model does not have the ability to describe the logical relationship between the user and document query terms.
- a feature of the Boolean model is that the logical relationship is established by Boolean logic functions, e.g., the Boolean AND or OR functions.
- Boolean logic functions e.g., the Boolean AND or OR functions.
- the Boolean model is not able to weight • the query terms according to importance to the user, and the retrieved results cannot be sorted in accordance with the degrees of relevance.
- the vector space model describes such a user's query as :
- the extended Boolean model overcomes the problems associated with items (1) and (2) but does not solve the problem of item (3) .
- a desired document item or information item is retrieved from a plurality of document items and/or information items in response to a query.
- the items are identified by item index vectors .
- the query includes plural query terms related to each other in Boolean logic form.
- the method comprises transforming the query terms in Boolean logic form into a transformed vector form, and retrieving the desired item in response to similarity measurements of (1) the transformed vector form of the query terms and (2) the index vectors.
- the transforming step includes calculating
- a square sum matrix by using a plurality of first index vectors having components indicating the presence or absence of each of the plural query terms, and (2) calculating eigenvectors and eigenvalues of the square sum matrix.
- the square sum matrix is calculated on the basis of all the first index vectors which are generated from the query.
- the transforming step further preferably includes selecting basis vectors from the eigenvectors, wherein the basis vectors constitute a subspace.
- the similarity measurements preferably include calculating inner products between the first index vectors and the basis vectors, and weighted coefficients employing the eigenvalues.
- the retrieving step preferably includes comparing the similarity measurement with a predetermined threshold to determine whether or not each of the items is relevant to the query.
- the weighted coefficients and the threshold are preferably varied to maximize a predetermined evaluation measure.
- the square sum matrix is preferably modified in response to at least one of (1) the first index vector being judged as being irrelevant in spite of being relevant to the, query, and (2) the first index vector being judged as being relevant in spite of being irrelevant to the query.
- the square sum matrix is preferably modified in one embodiment in response to at least one of a user deciding (1) that a retrieval item agrees with the query, and (2) that a retrieval item does not agree with the query.
- the square sum matrix is modified on the basis of another index vector having a component that indicates the presence or absence of each index term included in each of the items.
- a feedback vector is preferably calculated ⁇ by using an average vector of the another index vectors which are included in the item judged by the user to agree with the u'ser's request, or an average vector of the another index vectors which are included in the item judged by the user to disagree.
- a second similarity between each of the another index vectors and the feedback vector is measured.
- a third similarity is calculated by using the first mentioned similarity measurement and the second similarity measurement. The relevance of each item to the query is judged by comparing the third similarity measurement with a predetermined threshold.
- the square sum matrix preferably is calculated in accordance with: where, fi denotes index vectors of the items relevant to
- ⁇ x denotes a set of all the first index
- the retrieval is preferably preformed with an apparatus comprising:
- an input unit for accepting a' query including plural query terms related to each other by Boolean logic
- a data processing unit connected to be responsive to the input unit.
- the data processing unit is programmed to be responsive to the query for (1) transforming the query into vector form, and (2) measuring the similarities of the item index vectors and the vector form of the query to determine which of the items correspond with the query.
- An output device connected to be responsive to the data processing unit provides an indication of the determination of which of the items correspond with the query.
- the data processing unit is preferably programmed to:
- the data processing unit is preferably programmed to modify the square sum matrix in response to at least one of (1) the first index vector being judged irrelevant in spite of being relevant to the query, and (2) the first index vector being judged relevant in spite of being irrelevant to the query.
- Another aspect of the invention relates to a program for controlling a data processing unit used to assist in retrieving a document item and/or an information item from a plurality of document items and/or information items.
- Each of the items is identified by an index vector.
- the retrieval is in response to a query including plural query terms related to each other by Boolean logic.
- the program causes the data processing unit to transform the query into vector form and to measure the similarities of the item index vectors and the vector form of the query to determine which of the items correspond with the query.
- the program causes the data processing unit to modify the square sum matrix in response to at least one of (a) the first index vector being judged irrelevant in spite of being relevant to the query and (2) the first index vector being judged relevant in spite of being irrelevant to the query.
- Fig. 1A is a flow chart of operations of a first preferred embodiment of the present invention
- Fig. IB is a flow chart helpful in describing how the step of determining parameters for relevance judgement of Fig. 1A is preferably performed;
- Fig. 2 is a flow chart of operations of a second preferred embodiment of the present invention.
- Fig. 3A is a flow chart of operations of a third preferred embodiment of the present invention
- Fig. 3B is a flow chart of operations of a fourth preferred embodiment of the present invention
- Fig. 4 is a flow chart of operations of a fifth embodiment of the present invention.
- Fig. 5 is a block diagram of a preferred embodiment of document retrieval apparatus according to the present invention .
- Q (w ⁇ , ..,w N ) (hereinafter, referred to as Q) denote a query based on the Boolean model.
- the relevance of document i_ to query Q is ' determined by matching the vector fi for document i. and several N-dimensional vectors obtained by transforming Boolean query Q into a vector.
- Boolean query Q consists of N query terms, sometimes referred to as index terms.
- query Q can be theoretically transformed into 2 N - 1 index vectors.
- ⁇ ' ⁇ denote the set of all possible inde'x vectors
- documents relevant to the query Q can take among the 2 N - 1
- index vectors and let ⁇ 0 denote the set of index vectors
- (1) Shibuya, (2) Chinese food, (3) Italian food and (4) Restaurant are respectively transformed into the first, second, third and fourth terms of each of f ⁇ T , f 2 T and f 3 T .
- the first and fourth terms of each of f ⁇ T , f 2 T and f 3 have the binary value "1".
- the second and third terms are respectively the binary values 0 and 1, indicating that the document to be retrieved does not have Chinese food, but does have Italian food.
- the second and third terms are respectively the binary values 1 and 0, indicating that the document to be retrieved has Chinese food, but does not have Italian food.
- the second and third terms are respectively the binary values 1 and 1, indicating that the document to be retrieved has Chinese and Italian food.
- vector set ⁇ o is the set of vectors other than the three
- document j is indicated as a member of set ⁇ i.
- Sentence Vector Set Model (SVSM) similarity (as disclosed in Takahiko Kawatani: “Text Processing by Sentence Vector Set Model", Research Report on Natural Language Processing, Information Processing Society of Japan, 2000-NL-140, pp .31-38 (2000 ) ) is employed as the similarity scale. According to the SVSM-similarity, the similarity between a
- vector fi and a vector set ⁇ i can be exactly measured using the eigenvectors and eigenvalues of a square sum matrix (to)
- Document retrieval apparatus 100 is a computer system including input unit 110 which responds to user inputs representing a Boolean query Q for retrieving any desired document.
- a feedback arrangement (not shown) supplies feedback information based on retrieved results to input unit 110.
- Input unit 110 derives an output signal indicative of the query supplied to input unit 110.
- the output signal of unit 110 is connected to an input of calculation unit 120.
- Unit 120 typically a central processing unit of a programmed general purpose computer, retrieves documents stored in a document file 130 in response to the query from input unit 110 and index information in the stored documents.
- Calculating unit 120 includes a memory system 122 comprising a random access memory (RAM) and a programmed memory for causing unit 120 to execute the steps of Fig. 1, 2, 3 or .
- Unit; 120 responds to the output signal of unit 110 and the index information in the stored documents and the programmed memory of system 122 to deliver the retrieved results to output unit 140.
- Figs. 1A and IB together are a flow chart of the steps that calculation unit 120 performs in accordance with a first embodiment of the present invention, i.e., the steps stored in the programmed memory of memory system 122.
- calculating unit 120 responds to query Q in Boolean logic form from input unit 110. Then, during step 12, calculating unit 120 calculates a square sum matrix, and during step 13 unit 120 calculates eigenvalues and eigenvectors of a transformation from Boolean logic into vector form of the relationship between the input query and the index information in the documents that file 130 stores. During step 14, unit 120 determines parameters for relevance of the input query to the index information in the documents that file 130 stores. Unit 120 executes a retrieval operation during step 15.
- input unit 110 derives query Q in
- unit 120 calculates the square sum matrix S from all the vectors included in the vector set
- Unit 120 calculates the eigenvalues and eigenvectors of square sum matrix S during eigenvalue/eigenvector calculation step 13.
- unit 120 calculates the similarity ri between
- Unit 120 measures, i.e., calculates the similarity r ⁇ for all the index vectors (numbering 2 N - 1) which can be generated from query Q that has N index terms. To make the measurement, unit 120 experimentally determines in an iterative manner
- L is the number of eigenvalues and eigenvectors which are used.
- L dimensional subspace is spanned by L eigenvectors as basis vectors; "L thus has minimum and maximum values 1 and R, respectively. It is advantageous for the value of L to be as small as possible, to reduce the processing time of unit 120 in retrieving a document from file 130, while achieving high performance.
- Each eigenvalue has an upper limit value ⁇ .
- eigenvalues scale, that is, weight, the similarity calculation of Equation (2) .
- use of raw calculated eigenvalues does not always produce a favorable result. Consequently, the eigenvalues have a predetermined, selected upper limit value of ⁇ . All eigenvalues greater than the predetermined value ⁇ are clipped to ⁇
- the threshold ⁇ which varies in steps from
- Fig. IB is a detailed flow chart of iterative operations computer 100 performs during block 14 to
- an evaluation measure (referred to as the F-measure) are obtained.
- A (a number known prior to the retrieval process beginning) denotes the number of relevant index vectors
- B is the number of index vectors judged as being relevant as a result of the similarity value r l r computed from Equation (2) (i.e., the number of index vectors having a similarity value that exceeds the threshold ⁇ )
- C is the number of relevant index vectors judged as being relevant to query Q within B.
- unit 120 retries from its memory 122
- calculation unit 120 calculates A, B, C, R and P in accordance with the previously discussed principles using the value of ri calculated during step 17 and the value of threshold . Then, during operation 19, calculation unit 120 calculates the F measure from the values of R and P determined during operation 18. Unit 120 calculates F in accordance with Equation (3) . The value of F calculated during operation 18 is stored in the RAM of memory system
- unit 120 determines if calculation unit 120 has performed operations 17, 18 and 19
- unit 120 advances to operation 21
- Unit 120 repeats operations 17-19
- unit 120 advances to operation 21 during which unit 120 determines the maximum stored values of F (i.e., F ma; . ; ) and
- unit 120 After unit 120 has completed operation 21, unit 120 advances to retrieval step 15, Fig. 1A.
- unit 120 determines the relevance, i.e., similarity, is judged of each of the documents stored in document file 130 with the transposed query. Unit 120 outputs the similarity as a retrieved result. Unit 120 determines the similarity by using Equation (2), where fi denotes the index vector of
- Fig. 2 is a flow diagram of operations that unit 120 performs in accordance second preferred embodiment of the present invention.
- Blocks 11-14 of Fig. 2 are the same as blocks 11-14 of Fig. 1A.
- input 110 supplies a Boolean logic query Q to calculation unit 120; during block 12 unit 120 the step of calculates a square sum matrix; and during block 13, unit 120 calculates eigenvalues and eigenvectors.
- unit 120 determines parameters as ' in the embodiment of Fig. 1A.
- an inputted query Q in Boolean logic form is transformed into a vector form by the processing blocks 11 through 14.
- unit 120 advances to block 25 during which unit 120 judges whether or not feedback is necessary based on the result obtained from relevance judging parameter determination step 14.
- Unit 120 determines that feedback is necessary if unit 120 determines (1) an index vector is judged as irrelevant, in spite of being included among
- index vectors i.e., is in vector set ⁇ i, or (2)
- an index vector is judged as relevant in spite of being included among irrelevant index vectors, i.e., is . in vector
- evaluation measure F does not converge in a
- unit 120 determines that evaluation measure F has converged, or if measure F is being reduced each time feedback is repeated, feedback is not executed and the program advances to retrieval operation 15 that is performed as described in connection with Fig. 1A. Retrieval operation 15 is also performed if unit 120 determines during step 25 that feedback is not necessary.
- unit 120 determines feedback is necessary and determines, during operation 24, that F has not converged, unit 120 advances to block 26 during which unit 120 selects vectors for the feedback operation. Unit 120 then advances to block 27 during which unit 120
- Unit 120 then advances to block 28 during which the unit determines parameters for enabling similarities, i.e., relevance, judgements to be performed.
- unit 120 determines the index vectors to be fed back, i. e., unit 120 selects the index vectors obtained as undesirable results.
- unit 120 creates the vector set associated with the feedback operation.
- One way of performing operation 26 is to let r ra i n denote the minimum • value of the similarity of the relevant index vector, and r ma ⁇ denote the maximum value of the similarity of the irrelevant index vector. In such a case, the relevant index vector whose last result similarity is less than r max (or whose first iteration result from block 14 is less than
- unit 120 calculates Equation (4) in accordance with:
- Equation (4) S denotes the square sum matrix that unit 120 calculates during block 12.
- unit 120 performs processing which is the same as that of the relevance judging parameter determination step 14.
- unit 120 calculates an evaluation measure while varying the values of the parameters a and b, so as
- Unit 120 determines the values of a and b during operation 28 in the same way that the unit determines A and B during operation 18, Fig. IB.
- unit 120 After unit 120 performs step 28, the unit again performs step 25, to determine whether or not further feedback is necessary. Unit 120 repeats operations 24-28 in sequence until the unit performs retrieval step 15, after determining further feedback is not necessary or that F has converged.
- Fig. 3A is a flow diagram of steps general purpose computer 100 is programmed to take in connection with a third embodiment of the present invention.
- program 122 actives unit 120 to perform block 200, during which unit 120 executes the steps shown in Fig. 2.
- Memory 122 then, during operation 29, activates unit 120 and display output device 140, so the display provides a user with a visual indication of a retrieval result.
- the user based on the displayed retrieval information, the user, during operation 34, decides whether or not feedback concerning the retrieval result is necessary by selectively supplying a signal to unit 120 via input 110. If the user decides feedback is necessary, the user activates input 110 and unit 120 responds to input 110 to advance to block 30.
- unit 120 During block 30, the user responds to the output display 130 to activate input 110 to cause unit 120 to be supplied with a signal that indicates the displayed document is desirable or undesirable. Then unit 120 advances to block 31, during which unit 120 modifies a square sum matrix by using index vectors fi on the basis of query Q and the input from the user which input 110 supplied to unit 120 during operation 30. Then unit 120 advances to block 32
- unit 120 determines parameters L, a, ⁇ , a and
- Memory 122 then activates unit 120 to perform block 33, during which unit 120 retrieves a document from file 130.
- retrieval operation 33 is first executed in the same way as retrieval operation 15, per Figs. 1A and 2.
- Unit 120 activates output 140 to present the retrieval result to the user during retrieval result display step 29.
- the user evaluates the displayed result, and decides whether or not feedback is necessary. If (1) a document desired by the user is in the retrieval result, or (2) the user wants the retrieved documents which are relevant to the desired document to be re-retrieved, or (3) the user decides that an undesired document is erroneously in the retrieval result, the user designates such a desired or undesired document during user feedback selection step 30.
- Each document desired by the user is referred to as a "positive document,” whereas each undesired document is referred to as a "negative document.” Two or more of such documents can be designated.
- ⁇ nf are initially empty, i.e., have zero values.
- unit 120 computes the square sum matrix in accordance with Equation (5) :
- a x and b x are calculated parameters, and the eigenvalues and eigenvectors of the matrix S 2 are obtained, and symbol Si denotes the square sum matrix that unit 120 calculates during block 27 (Fig. 2) .
- unit 120 performs the same processing that the unit performed during relevance judging parameter determination step 14. In addition, during step 32, unit 120 calculates evaluation measures for different values of parameters a ⁇ and bi . Unit 120 determines the values of parameters a__ and bi which result in evaluation measure F being maximized, in a manner similar to that as described in connection with Fig. lb. Step 32 differs from step 14 because during step 32 unit 120 (1) removes any index vector of any negative
- the retrieval step 33 is executed the
- step 29. unit 120 supplies the retrieval result of step 33 to the display of output 240.
- the components of the index vectors of the positive and negative documents represent whether or not query terms included in a query are employed as index terms in the documents. However, all extant terms in each document can be adopted as the vector components.
- unit 120 modifies the square sum matrix and calculates the eigenvalues and eigenvectors of matrix S 3 in accordance with:
- the dimensions of the rows and columns of matrix Si in Equation (6). are usually smaller than those of the rows and columns of the vectors ⁇ . Therefore, after the size of the matrix Si is adjusted, the rows and columns need to be moved so that the i-th column and i-th row of matrix Si correspond to the same index term as the index term of the i-th components of vectors . If the term corresponding to the i-th components of vectors g is not employed in a query, the i-th column and i-th row of the matrix Si is set to zero.
- unit 120 advances to relevance judging parameter determination step 32 which is executed in the same way as described in connection with operations 32 and 28 in Figs. 2 and 3B.
- unit 120 judges relevance as to each of the documents stored in file (i.e., database) 130, and - ' outputs a retrieval result to output 140.
- unit 120 calculates similarity in accordance with Equation (7):
- Equation (7) gi denotes the vector of document 1. and values determined as stated above are employed as parameters. The other points can be the same as in the prior art, in performing the present invention.
- Fig. 4 is a flow diagram of operations that unit 120 performs to retrieve documents in accordance with a fifth embodiment of the present invention. Initially, unit 120 performs the retrieval steps of Fig. 2, as indicated by block 200. Then, unit 120 supplies the retrieved documents to the display of output 140, as indicated by block 29. During operation 34, a user responds to the displayed documents and determines if feedback is necessary. Then, during operation 30, the user selects the desirable and undesirable documents. Then unit 120 advances in sequence to steps 41 and 42 during which the unit respectively calculates a feedback vector and executes document retrieval.
- unit 120 first calculates the similarity r'i between the feedback vector g' and the index vector g ⁇ of a document to-be-retrieved, i.
- Unit 120 calculates r'i in accordance with Equation (9):
- unit 120 determines a similarity r i for the document to-be-retrieved, i , as a function, such as a weighted average.
- Unit 120 calculates the similarity ri for the document to- be-retrieved, 1 , by modifying block 15 (in Figs. 1A and 2), in accordance with Equation (10) :
- unit 120 also sorts retrieved documents in accordance with the degrees of document relevance.
- the documents are displayed at output 140 during operation 29, so that the documents most relevant to the user's input query Q are displayed first and the least relevant are displayed last.
- Optimal values are determined while unit 120 varies the values of parameters a 3 and b 3 . This can be performed by a technique having heretofore been practiced.
- the present invention is very effective as described below in connection with a first example wherein a query Q in Boolean logic form consisting of eight query terms, Wi - w g , is represented as:
Abstract
Description
Claims
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP02709727A EP1374094A4 (en) | 2001-03-02 | 2002-03-01 | Document and information retrieval method and apparatus |
US10/469,586 US7194461B2 (en) | 2001-03-02 | 2002-03-01 | Document and information retrieval method and apparatus |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2001/58899 | 2001-03-02 | ||
JP2001058899 | 2001-03-02 |
Publications (2)
Publication Number | Publication Date |
---|---|
WO2002071277A1 true WO2002071277A1 (en) | 2002-09-12 |
WO2002071277A9 WO2002071277A9 (en) | 2003-01-30 |
Family
ID=18918548
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2002/006053 WO2002071277A1 (en) | 2001-03-02 | 2002-03-01 | Document and information retrieval method and apparatus |
Country Status (3)
Country | Link |
---|---|
EP (1) | EP1374094A4 (en) |
CN (1) | CN1269064C (en) |
WO (1) | WO2002071277A1 (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2003060766A1 (en) * | 2002-01-16 | 2003-07-24 | Elucidon Ab | Information data retrieval, where the data is organized in terms, documents and document corpora |
CN107609142A (en) * | 2017-09-21 | 2018-01-19 | 合肥集知网知识产权运营有限公司 | A kind of big data patent retrieval method based on Extended Boolean Retrieval model |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8527448B2 (en) | 2011-12-16 | 2013-09-03 | Huawei Technologies Co., Ltd. | System, method and apparatus for increasing speed of hierarchial latent dirichlet allocation model |
CN102591917B (en) * | 2011-12-16 | 2014-12-17 | 华为技术有限公司 | Data processing method and system and related device |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5704060A (en) * | 1995-05-22 | 1997-12-30 | Del Monte; Michael G. | Text storage and retrieval system and method |
US5794178A (en) * | 1993-09-20 | 1998-08-11 | Hnc Software, Inc. | Visualization of information using graphical representations of context vector based relationships and attributes |
-
2002
- 2002-03-01 EP EP02709727A patent/EP1374094A4/en not_active Withdrawn
- 2002-03-01 WO PCT/US2002/006053 patent/WO2002071277A1/en active Application Filing
- 2002-03-01 CN CNB028093127A patent/CN1269064C/en not_active Expired - Fee Related
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5794178A (en) * | 1993-09-20 | 1998-08-11 | Hnc Software, Inc. | Visualization of information using graphical representations of context vector based relationships and attributes |
US5704060A (en) * | 1995-05-22 | 1997-12-30 | Del Monte; Michael G. | Text storage and retrieval system and method |
Non-Patent Citations (3)
Title |
---|
MCCABE ET AL.: "System fusion for improving performance in information retrieval systems", INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY: CODING AND COMPUTING, 2 April 2001 (2001-04-02), pages 639 - 643, XP002950854 * |
See also references of EP1374094A4 * |
WONG ET AL., EXTENDED BOOLEAN QUERY PROCESSING IN THE GENERALISED VECTOR SPACE MODEL, pages 47 - 63 |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2003060766A1 (en) * | 2002-01-16 | 2003-07-24 | Elucidon Ab | Information data retrieval, where the data is organized in terms, documents and document corpora |
US7593932B2 (en) | 2002-01-16 | 2009-09-22 | Elucidon Group Limited | Information data retrieval, where the data is organized in terms, documents and document corpora |
CN107609142A (en) * | 2017-09-21 | 2018-01-19 | 合肥集知网知识产权运营有限公司 | A kind of big data patent retrieval method based on Extended Boolean Retrieval model |
Also Published As
Publication number | Publication date |
---|---|
EP1374094A1 (en) | 2004-01-02 |
CN1507596A (en) | 2004-06-23 |
EP1374094A4 (en) | 2007-09-05 |
WO2002071277A9 (en) | 2003-01-30 |
CN1269064C (en) | 2006-08-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7194461B2 (en) | Document and information retrieval method and apparatus | |
US7836057B1 (en) | Weighted preference inference system and method | |
US7634473B2 (en) | Document retrieval apparatus | |
US5826260A (en) | Information retrieval system and method for displaying and ordering information based on query element contribution | |
US5943669A (en) | Document retrieval device | |
US5687364A (en) | Method for learning to infer the topical content of documents based upon their lexical content | |
US5321833A (en) | Adaptive ranking system for information retrieval | |
US5870740A (en) | System and method for improving the ranking of information retrieval results for short queries | |
US5659766A (en) | Method and apparatus for inferring the topical content of a document based upon its lexical content without supervision | |
JP3664874B2 (en) | Document search device | |
US5787420A (en) | Method of ordering document clusters without requiring knowledge of user interests | |
US6804420B2 (en) | Information retrieving system and method | |
US20020016787A1 (en) | Apparatus for retrieving similar documents and apparatus for extracting relevant keywords | |
JP2002169834A (en) | Computer and method for making vector analysis of document | |
US20020198875A1 (en) | System and method for optimizing search results | |
US20070239707A1 (en) | Method of searching text to find relevant content | |
US20010049680A1 (en) | Information retrieval system, apparatus and method for selecting databases using retrieval terms | |
JPH07160731A (en) | Method and device for picture retrieval | |
US20080205795A1 (en) | System and methods of image retrieval | |
WO2008103961A1 (en) | Diverse topic phrase extraction | |
CN110069732B (en) | Information display method, device and equipment | |
US6721737B2 (en) | Method of ranking items using efficient queries | |
EP1374094A1 (en) | Document and information retrieval method and apparatus | |
JPH05101107A (en) | Device and method for narrowed-down data retrieval using adaption rate | |
US6915292B2 (en) | Method for updating multimedia feature information |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): CN US |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
DFPE | Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101) | ||
AK | Designated states |
Kind code of ref document: C2 Designated state(s): CN US |
|
AL | Designated countries for regional patents |
Kind code of ref document: C2 Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR |
|
COP | Corrected version of pamphlet |
Free format text: PAGES 1/7-7/7, DRAWINGS, REPLACED BY NEW PAGES 1/7-7/7; DUE TO LATE TRANSMITTAL BY THE RECEIVING OFFICE |
|
WWE | Wipo information: entry into national phase |
Ref document number: 10469586 Country of ref document: US |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2002709727 Country of ref document: EP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 028093127 Country of ref document: CN |
|
WWP | Wipo information: published in national office |
Ref document number: 2002709727 Country of ref document: EP |