CN102331987A

CN102331987A - Patent data mining system and method

Info

Publication number: CN102331987A
Application number: CN2010102275015A
Authority: CN
Inventors: 管中徽; 刘显仲; 查士朝; 郑正元; 高振沧
Original assignee: Individual
Current assignee: Individual
Priority date: 2010-07-12
Filing date: 2010-07-12
Publication date: 2012-01-25

Abstract

The invention provides a patent data mining system and a method. The system comprises a patent group obtaining unit, a reference analysis unit and a centrality index obtaining unit, wherein the patent group obtaining unit is used for obtaining a patent group, the reference analysis unit is used for analyzing the reference relationship between all patent cases in the patent group and building a reference network, the reference network comprises a plurality of joins, each join is respectively formed according to the reference relationship between the two patent cases with the reference relationship in the patent group, the reference analysis unit obtains a settlement from the reference network and gives a weight for each join in the settlement, and the centrality index obtaining unit is used for calculating the centrality index of each patent case in the settlement according to the weight of each join.

Description

The patent data mining system and method

Technical field

Relevant a kind of data digging system of the present invention and method refer in particular to a kind of patent data mining system and method based on citation relations.

Background technology

In the epoch of kownledge economy, patented claim is that enterprise possesses one of sharp weapon of competitive power to obtain exclusive right.Patent also can be sold through patent and create income except that making the dealer maintain its exclusive domain knowledge, avoid receiving rival's imitation or the infringement.Yet, after applying for a patent in a large number, are problems that each applicant faces for the burden of the huge maintenance funds that patented claim constituted.For minimizing funds expenditures and effectively limited fund use, therefore just must Limited resources be used on the valuable patent of maintenance.That is, if the value of patent much larger than safeguarding required cost, this patent is exactly the target of worth maintenance naturally so.But, how from a large amount of patent cases fast the location possess the patent case of higher-value, relate to much towards, therefore, under this patent trend of industry development, the real important topic of the intelligent excavating of patent data for the industry operation.

In the prior art, no matter be that research institution or enterprise are the value of assessing patent with cost-or-market method or market method mostly, such assessment mode needs the human input of certain degree to carry out work such as data search, industrial analysis.When having the number of patents case, there is the low defective of usefulness in such mode.

Add most of patent and be worth and really obtain real value usually, so carry out data search and industrial analysis is also widely different for the value evaluation results through human input through marketing.Therefore, often cause the estimated value of same piece of writing patent is tried to go south by driving the chariot north, cause the puzzlement of manufacturer when authorizing negotiation on the contrary.

Comprehensively above-mentioned, need patent group is carried out data mining method solving the problem that prior art produces of a kind of objective, quantification, scientific and robotization badly, and then reduce the cost that patent data is handled.Particularly, realize batch processing, automatic localizing objects data, thereby the usefulness of raising data processing to patent case through full automatic technical scheme.

Summary of the invention

The technical matters that the present invention solves is, the citation relations between patent case is shown, and based on this citation relations patent group is carried out automatic data mining, obtains target data.

For solving the problems of the technologies described above, the invention discloses a kind of patent data mining system, comprising:

Patent group acquiring unit is used to obtain a patent group;

The citation analysis unit; The citation relations that is used to analyze between each patent case of this patent group is quoted network as proof to set up one; This is quoted network as proof and has a plurality of bindings; Each this binding all forms according to the citation relations between two patent cases that have citation relations in this patent group, and this citation analysis unit is quoted as proof in this and obtained a settlement in network, for a weighted value is given in each binding that is had in this settlement; And

Centrality index acquiring unit is used for the weighted value that has according to each this binding, calculates the centrality index of each patent case in this settlement.

Patent group acquiring unit is used to obtain a patent group;

The citation analysis unit; Be used to analyze the citation relations between each patent case of this patent group; And obtain one according to this patent group and quote network as proof; This is quoted network as proof and has a plurality of bindings, and wherein each binding all forms according to the citation relations between two patent cases that have citation relations in this patent group, and gives a weighted value for each binding;

Image conversion unit, being used for quoting this as proof network switch is to quote tree as proof to show;

Centrality index acquiring unit is used for linking the weighted value that is had according to each, calculates the centrality index that this quotes each patent case in the network as proof;

Automatic identification unit; Be used for this each centrality index of quoting network as proof being judged, assert that the pairing patent case of centrality index that reaches this predetermined threshold value is that target data identifies this target data to quote as proof in the tree at this according to a predetermined threshold value.

For solving the problems of the technologies described above, the invention also discloses a kind of patent data mining method, comprise the following step at least:

Utilize patent group acquiring unit to obtain a patent group;

Utilize the citation relations between each patent case of this patent group of citation analysis element analysis to quote network as proof to set up one; This is quoted network as proof and has a plurality of bindings; Each this binding all forms according to the citation relations between two patent cases that have citation relations in this patent group; This citation analysis unit is quoted as proof in this and is obtained a settlement in network, and is that a weighted value is given in each binding that is had in this settlement; And

Utilize centrality index acquiring unit according to the weighted value that each this binding had, calculate the centrality index of each patent case in this settlement.

Utilize patent group acquiring unit to obtain a patent group;

Utilize the citation relations between each patent case of this patent group of citation analysis element analysis; And obtain one according to this patent group and quote network as proof; This is quoted network as proof and has a plurality of bindings; Wherein each binding all forms according to the citation relations between two patent cases that have citation relations in this patent group, and gives a weighted value for each binding;

Utilize centrality index acquiring unit, link the weighted value that is had, calculate the centrality index that this quotes each patent case in the network as proof according to each;

Utilize automatic identification unit, this each centrality index of quoting as proof in the network is judged, assert that the pairing patent case of centrality index that reaches this predetermined threshold value is a target data according to a predetermined threshold value;

Utilizing image conversion unit to quote this as proof network switch is to quote tree as proof showing, and quotes as proof in the tree at this this target data is identified.

The technique effect that the present invention realizes shows the citation relations between patent case, and based on this citation relations patent group is carried out automatic data mining, obtains target data.Particularly can realize batch processing, automatic localizing objects data, thereby the usefulness of raising data processing to patent case.

Description of drawings

Figure 1A is depicted as the functional structure module map of patent data mining of the present invention system 100;

Figure 1B is depicted as the schematic flow sheet of patent data mining method of the present invention;

Fig. 1 C is depicted as patent data mining of the present invention system 100 another embodiment synoptic diagram;

Fig. 1 D is depicted as the process flow diagram of patent data mining method of the present invention;

Fig. 1 E is depicted as the structural representation of patent data mining of the present invention in another embodiment system;

Fig. 1 F is a patent data mining method embodiment schematic flow sheet of the present invention;

Fig. 1 G is a patent data mining method embodiment schematic flow sheet of the present invention;

Fig. 2 quotes network (settlement) the first embodiment schematic flow sheet as proof for formation of the present invention;

Fig. 2 A is the functional structure module map of patent data mining of the present invention system 100;

Fig. 2 B is the functional structure module map of patent data mining of the present invention system 100;

Fig. 3 A is according to the resulting result for retrieval synoptic diagram of a keyword;

Fig. 3 B is the retrieval patent group synoptic diagram relevant with target patent case P;

Fig. 4 A to Fig. 4 C is a settlement synoptic diagram of quoting network as proof;

Fig. 5 A and Fig. 5 B are respectively and utilize SPLC or SPNP to link the weight synoptic diagram;

Fig. 6 is that the network (settlement) of quoting as proof with weight links synoptic diagram;

Fig. 7 is the relative worth synoptic diagram;

Fig. 8 A to Fig. 8 C is that network (settlement) second to the 4th embodiment synoptic diagram is quoted in generation of the present invention as proof.

Wherein, Reference numeral:

2-patent data mining method

20～25-step

200～203-step

300～327-patent case

S-patent group

S '-settlement

90,91,92-links

G1, G2, G3-settlement

Patent data mining system 100 patent group acquiring units 11

Citation analysis unit 12 centrality index acquiring units 13

Automatic Evaluation and recognition unit 14 image conversion unit 15

Updating block 111 is quoted network selecting unit 123 as proof

Search path links numeration unit 121 search path nodes to unit 122

Embodiment

For making your juror further cognition and understanding arranged to characteristic of the present invention, purpose and function; The hereinafter spy with the theory source of the correlative detail of method of the present invention and design by describing; So that the juror can understand characteristics of the present invention, specify statement as follows:

The present invention carries out data processing in batches through 100 pairs of patent cases of a patent data mining system, obtaining target data, and the network of quoting as proof that exists in the patent case is shown.This patent data mining system 100 can be arranged at least one the data processing equipment.The hardware configuration that this data processing equipment comprises in the prior art being used always, like computer, workstation or server etc., it comprises processor, storer, internal memory, display device, input equipment, network interface etc.See also shown in Figure 1A, be the functional structure module map of patent data mining of the present invention system 100.

Patent data mining system 100 comprises a patent group acquiring unit 11, a citation analysis unit 12 and a centrality index acquiring unit 13.

Patent group acquiring unit 11 is used to obtain a patent group.This patent group acquiring unit 11 can be connected with a database through network, thereby directly from this database, obtains this patent group through the operation-interface of retrieval with relevant search condition.Perhaps, from the storer that is connected with patent group acquiring unit 11, for example transfer in the hard disk.

Each patent case of the patent group that citation analysis unit 12 is used for patent group acquiring unit 11 is obtained is carried out the citation relations analysis; Set up one with the citation relations between each patent case of this patent group and quote network as proof; This is quoted network as proof and has a plurality of bindings; Each this binding all forms according to the citation relations between two patent cases that have citation relations in this patent group; This citation analysis unit 12 is also quoted as proof in this and to be obtained a settlement in network, and gives a weighted value for each binding that is had in this settlement.

Possibly possess citation relations between any two patent cases, then possess a binding as long as exist once to quote as proof, quoting as proof each other of existing between a plurality of patented claims can finally form one and quote network as proof.

Centrality index acquiring unit 13 is used for linking the weighted value that is had according to each, calculates the centrality index that this quotes each patent case in the network as proof.This centrality index is used to represent the significance level of each patent case.

See also shown in Figure 1B, be the schematic flow sheet of patent data mining method of the present invention.

Steps A utilizes patent group acquiring unit to obtain a patent group;

Step B; Utilize the citation relations between each patent case of this patent group of citation analysis element analysis to quote network as proof to set up one; This is quoted network as proof and has a plurality of bindings; Each this binding all forms according to the citation relations between two patent cases that have citation relations in this patent group, and this citation analysis unit is quoted as proof in this and obtained a settlement in network, and is that a weighted value is given in each binding that is had in this settlement; And

Step C utilizes centrality index acquiring unit, links the weighted value that is had according to each, calculates the centrality index of each patent case in this settlement.

Shown in Fig. 1 C, this figure is the synoptic diagram of another embodiment of patent data mining of the present invention system 100.In the present embodiment, this system is similar with Figure 1A basically, difference be that this system 100 also has an automatic Evaluation and recognition unit 14 is used for through data mining recognition objective data.

This automatic Evaluation and recognition unit 14 can be realized as follows:

First; According to included one preset target patent case in this settlement with value; Judge the relative evaluation index of the centrality index of each patent case in centrality index and this settlement of this target patent case, so identify have value in this settlement other patent cases as target data.

The second, according to a target patent case included in this settlement, and one in this settlement have the ratio of this centrality index between the patent case that confirm to be worth, and calculates the value that this target patent case is had.

The 3rd, according to a predetermined threshold value this each centrality index of quoting as proof in the network is judged, assert that the pairing patent case of centrality index that reaches this predetermined threshold value is a target data.And this target data is through data mining, and further screening obtains from the patent group of original acquisition.

See also shown in Fig. 1 D, be the schematic flow sheet of patent data mining method of the present invention.

Steps A utilizes patent group acquiring unit to obtain a patent group;

Step B; Utilize the citation relations between each patent case of this patent group of citation analysis element analysis to quote network as proof to set up one; This is quoted network as proof and has a plurality of bindings; Each this binding all forms according to the citation relations between two patent cases that have citation relations in this patent group, and this citation analysis unit also obtains a settlement in this patent group, and gives a weighted value for each binding that is had in this settlement;

Step C utilizes centrality index acquiring unit, links the weighted value that is had according to each, calculates the centrality index of each patent case in this settlement;

Step D utilizes automatic Evaluation and recognition unit, the recognition objective data.

Be depicted as the structural representation of patent data mining of the present invention in another embodiment system like Fig. 1 E.

Wherein, this system further comprises image conversion unit 15, and being used for quoting this as proof network switch is to quote tree as proof to show.In addition, the target data that filters out of this quilt also can be quoted as proof on the tree at this and identified especially.

See also the process flow diagram that Fig. 1 F is depicted as patent data mining method of the present invention.

Steps A utilizes patent group acquiring unit to obtain a patent group;

Step D utilizes automatic identification unit, according to a predetermined threshold value this each centrality index of quoting as proof in the network is judged, assert that the pairing patent case of centrality index that reaches this predetermined threshold value is a target data;

Step e is utilized image conversion unit to convert this settlement into one and is quoted tree as proof showing, and quotes as proof in the tree at this this target data is identified.

Below specify the present invention, see also shown in Fig. 1 G, this figure is a patent data mining method embodiment schematic flow sheet of the present invention.In this embodiment; This method 2 includes the following step; At first with step 20 a patent group is provided, sets up one by this patent group and quote network as proof, wherein this is quoted as proof and has a plurality of bindings in the network; Each binding is set up by wantonly two patent cases with citation relations and is formed, and selects this to quote the settlement in the network as proof again.With basis, the settlement selected as subsequent treatment.

See also shown in Figure 2ly, this figure is that the network first embodiment synoptic diagram is quoted in foundation of the present invention as proof.The mode of setting up is at first sought out a patent group with step 200 according at least one search condition in a database, it has a plurality of patent cases.In this step, this database can be the patent database that is had in each State Intellectual Property Office, for example: United States Patent (USP) database or European Union's patent database etc.; In addition, the database that this database also can be set up for business software, for example: Delphion patent database etc., but not as limit.In step 20; This search condition can include keyword (keyword), IC sign indicating number (International patent classification; IPC), application number (applicationnumber), publication number (publication number), notification number (issue number), applicant (applicant), inventor (inventor) or aforesaid combination in any etc., but not as restriction.This keyword can be a lists of keywords of storage in advance.

Result after step 20 is carried out, shown in Fig. 3 A, this figure is according to the resulting result for retrieval synoptic diagram of a keyword.The combination of a plurality of patent cases of gained is the S of this patent group after the retrieval.Though be noted that citedly among Fig. 3 A to be the United States Patent (USP) notification number, each patent case can or be got permission the bulletin case of patent for the open case of patent application case, and does not exceed with United States Patent (USP).But have only the full patent texts data in U.S.'s bulletin patent database just to have the complete data of quoting as proof in the at present free patent database, therefore, method proposed by the invention is only applicable to the United States Patent (USP) announced at present.

Return shown in Figure 2ly, after the step 200, then carry out step 201, judge whether the S of this patent group is suitable group.

Whether one of judgment mode is suitable for the patent case quantity that is contained according to the S of this patent group; If very little, then representative maybe be in step 200, and the search condition that is adopted is also inappropriate; Therefore need return step 200 defines search condition again, again retrieval again.That is, preestablish a case number of packages amount, judge whether the patent case quantity that retrieval obtains reaches this predefined caseload, if do not reach, change search condition, retrieval again if reach, is carried out subsequent step.The step of this change search condition can be included in chooses another keyword in the lists of keywords, or directly chooses keyword outer other search conditions and combination thereof.

A kind of in addition practice is to set up one " Auditing Section " (benchmark set) at first in advance; Auditing Section has comprised at least one and has known relevant target patent case P; Then when each adjustment search condition, judgement " whether result for retrieval is suitable ", check all whether the patent group that retrieves has incorrect eliminating or filter out the patent of " Auditing Section " lining.If get rid of or filter out the patent of this " Auditing Section " lining, be regarded as adjusting search condition.Target patent case P can or get permission the bulletin case of patent for the open case of patent application case.

See also Fig. 2 A and be the functional structure module map of patent data mining of the present invention system 100.

Specifically; In this embodiment; Comprise further in this patent group acquiring unit 11 that a updating block 111 is with execution in step 201; This updating block 111 is used to judge whether this patent group that retrieves comprises this target patent case (and/or judging whether this patent group that retrieves reaches predetermined caseload scope), as not, then adjusts this search condition; Again produce a patent group, comprise this target patent case (and/or reaching predetermined caseload scope) up to the patent group that is produced.

For example, represent optimal patent group with the oblique line scope A of Fig. 3 B.If use incorrect search condition, then possibly obtain patent group, just so can omit some patents like the scope representative of C or D.If search condition is suitable, then can obtain the patent group that scope contained, in then can scope A being included in like area B or E.

Return shown in Figure 2; If the S of this patent group is suitable by judgement; Then with step 202, according to the result for retrieval of Fig. 3 A, open case or bulletin case that the content of each patent case that can in this patent group, be had is quoted; And the formed relevance of quoting as proof of related art of juror's row of carrying in checking process, set up one between two patent cases of citation relations and link and to quote network as proof having to form one.When patent A was quoted by patent B, certain information of patent A (a piece ofknowledge) was " being correlated with " as far as patent B basically, for example maybe A was the improvement of B or for identical problem, A has adopted the practice different with B.From the angle of " information flow " (information flow), can imagine that this " a piece of knowledge " is from patent A " flow direction " patent B.Therefore represent that with linking patent A and patent B are two nodes (node), and the binding of A and B is to use from A to B an arrow (arrow) to represent, has represented " a piece of knowledge " from A " flow direction " B.Note that aforesaid " citation relations " can be to adopt so-called forward direction to quote (forward citation) or back as proof to quoting (backward citation) the two one of them as proof.

Shown in Fig. 4 A to Fig. 4 C, this figure is for quoting network diagram as proof.According to the result for retrieval of Fig. 3 A, the network packet of quoting as proof of utilizing step 202 to obtain contains a plurality of settlements (cluster), and present embodiment is three (Fig. 4 A to Fig. 4 C).Wantonly two patent cases in the settlement must have direct or indirect citation relations.Return shown in Figure 2ly,, quote as proof in the settlement that network comprises, the patent case number that contains according to each settlement, or whether have and comprise target patent case P to select suitable settlement by this then with step 203.

For example; With target patent case P is US.Pat.No.4; 310,211 is example, when in these a plurality of settlements, selecting suitable settlement conduct to quote network as proof; Mainly contain several kinds of situation: first kind for target patent case P fully not in any settlement, for example in Fig. 4 A to Fig. 4 C, can not find target patent case P fully.

Second kind belongs to one of them settlement really for target patent case P, but the patent quantity of this settlement seldom, and for example, Fig. 4 B and Fig. 4 C are the little settlements of quantity.The third is the target patent case and is in the settlement of suitable size, shown in the settlement of Fig. 4 A.If belong to first kind with second kind situation, represent that then search condition is inappropriate, therefore need come back to step 20 again and retrieve.Because in the present embodiment; Target patent case P does not appear in the settlement among Fig. 4 B and Fig. 4 C; Therefore can reject the settlement of Fig. 4 B and Fig. 4 C; The settlement size at that will be estimated and target patent case P place suitably, therefore with the settlement (shown in Fig. 4 A) of containing this target patent case P as quoting network as proof.Owing among target patent case P do not appear among Fig. 4 B and Fig. 4 C, therefore can reject Fig. 4 B and Fig. 4 C, and with Fig. 4 A as the foundation of analysis next.Note that quoting network as proof can comprise one or more settlement, and also can be regarded as be a child network (sub-network) of quoting as proof in the network in the settlement.In other words, a settlement itself also is a sub-network.In this manual, " sub-network " is synonym with " settlement " two speech in fact.

Fig. 2 B is the functional structure module map of patent data mining of the present invention system 100.

In this embodiment; This citation analysis unit 12 further comprises a settlement selected cell 123, and this settlement selected cell 123 is used for according to an alternative condition, from a plurality of settlements that this patent group forms, selects one; This alternative condition comprises a patent case quantity and/or a target patent case; This settlement selected cell selector is should the settlement of patent case quantity, and/or, select to comprise the settlement of this target patent case.

Return shown in Fig. 1 G, step 20 is selected after at least one suitable settlement, and each that then gives in this settlement with step 21 links a weighted value.In step 21; The mode that gives weight has a variety of; For example, the weighted value of each binding all is 1, or links number scale (search pathlink count with search path; SPLC) or the search path node (search path node pair SPNP) calculates each and links the weight had to method.

In one embodiment; Shown in Fig. 2 B; This citation analysis unit 12 comprises that a search path binding numeration unit 121 or a search path node are to unit 122 (not shown); This search path links numeration unit 121 and is used for giving weighted value according to search path binding number scale for each binding, and this search path node is used for giving weighted value to method for each binding according to the search path node to unit 122.

See also shown in Fig. 5 A and Fig. 5 B, this figure is respectively and utilizes SPLC or SPNP to link the weight synoptic diagram.The mode of SPLC at first is described, in Fig. 5 A,, two node A and B (each node is represented patent announcement case or the open case of patent) is arranged on the right side that links 90 for linking for 90; And can reach four node E, G, I and J in the left side that links 90, therefore the weight of this binding 90 is 4x2=8.In addition; Shown in Fig. 5 B, when calculating the weight of binding 91, for the node D of binding 91 with SPNP; Its right-hand member has three node A～C to pass through to link 91 and arrives at node D; And as far as node C, it can pass through and link 91 and arrive at seven node D～J, and therefore linking 91 weight is 3x7=21.The weight result that step 21 calculated is as shown in Figure 6.

Return shown in Figure 1ly, then, link the weighted value that is had according to each, calculate among the suitable settlement S ' that picks out a centrality index about each patent case had with step 22.

The centrality index of present embodiment be proper vector centrality (eigenvector centrality, EC).Because patent citation is formed quotes the community network (socialnetwork) that network can be regarded as a kind of broad sense as proof, and centrality (centrality) is considerable notion in the social network analysis, is used for weighing the importance of each node in the network.And the employed proper vector centrality of present embodiment, it can be applied to directivity and have in the network of weight.And proper vector centrality has unique characteristic, that is the importance of node is directly proportional with the heavily effect property that node had of all sensings, and for example, in the patent citation network, by the patent of an important patent citation, then its importance also can significantly increase.Therefore, the patent that centrality is high is not to determine by quoting number (citation count) as proof, but is decided by the location prominence of this patent in quoting network as proof.

Next explain how centrality index acquiring unit 13 calculates the method for this centrality index.The central conceptual illustration of proper vector is following, supposes to have the network of a n node, and it has the weight matrix A that binding constitutes.Each elements A in matrix A _JkRepresentation node k points to the weight of node j.In matrix, cornerwise element is all zero.Hypothesis has the vectorial I of a n element, each the element I among the wherein vectorial I again _jRepresent the importance (rank score) of corresponding nodes j.Since the notion of proper vector centrality (EC) be each node importance should with form a proportionate relationship after all node importances that point to this node be multiplied by the weight of corresponding binding, shown in (1):

c \cdot I_{j} = \underset{k}{Σ} A_{jk} \cdot I_{k} . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (1)

Wherein, c represents a proportionality constant, and formula (1) with matrix notation, is promptly become suc as formula shown in (2):

c·I＝A·I.......................................(2)

Can find out that by formula (2) c is the eigenwert (eigenvalue) of matrix A, and I is proper vector (eigenvector).Can obtain n according to formula (2) and separate, and wherein biggest characteristic value characteristic of correspondence vector I has the index meaning.According to above-mentioned explanation, be example with the settlement S ' that Fig. 6 was set up with binding weight, the proper vector I in the formula (2) is the formed vector matrix of the node of representing each to represent the patent case.And each elements A in the matrix A _Ij, then in the representative graph 6, the weight that node i binding 92 of going out node j pointed is had, the cornerwise element in this matrix A, then representation node points to ownly, therefore is zero.

When wanting calculating formula (2), utilize solution by iterative method usually, that is give initial value of all elements in the proper vector I earlier.Because the element in the matrix A is known,, therefore can calculate eigenwert c as each links pairing value among Fig. 6.Adjust the value that all elements is had in the next proper vector I again, utilize iterative repetition to carry out computing, till proper vector I converges to a fixed value.Because each proper vector I that converges to fixed value can correspond to an eigenwert c; Therefore be worth the pairing value of each element (centrality index) in the pairing proper vector I by having biggest characteristic among these a plurality of eigenwert c again, as value or the importance that each node had among Fig. 6.Because the value that proper vector I had through formula (2) is found the solution out can correspond to importance or value that each node (patent) is had, can learn relative worth and importance that each node is had with respect to this settlement S ' thus.

Return shown in Fig. 1 G; After the step 22; Also comprise step 23, utilize this automatic Evaluation and recognition unit 14 to judge whether that patent case has known value (that is, whether in storer, search relevant for the patent case in the patent group associated value record); If it's not true, the relative evaluation index of centrality index in this settlement that is then had with a target patent case that is had in the step 24 calculating settlement.As shown in Figure 7, this figure is the relative worth synoptic diagram.For example: the pairing value of all elements in the proper vector I can be set up out a percentile rank (percentile rank, PR) table.Therefore this tabulation demonstrates in the settlement, and the drop point site of each patent can be learnt by this table, centrality desired value and drop point site that the target patent case is had, and then can judge this target patent case for importance that this settlement had.In other words; The distribution of the centrality desired value all centrality desired values in the settlement that had for target patent case P; For example: the proper vector centrality value of target patent case is to be positioned at precedingly 10%, and an objective appraisal can be arranged, and represents the relative importance or the value of target patent case representative in quoting network as proof; That is, assert that be positioned at preceding 10% patent case is target data.Otherwise; If in step 23; If there is patent case to have known definite value, then can step 25 utilize centrality index and this of this target patent case to have the ratio of centrality desired value between the patent case of confirming to be worth, calculate the value that this target patent case is had; That is, this has the value multiple of the patent case of confirming value to this target patent case relatively.

Step 20～25 in the foregoing valuation of patent method can be stored in the Storage Media by computer program, and when the computer program loads computing machine is carried out, can realize patent data mining method of the present invention.This computing machine can be server, workstation or personal desktop or mobile computer.Storage Media can be CD, hard disk or internal memory etc.

As previously mentioned, step 200～203 in the step 20 are being found out patent group, are being set up and quote network as proof and select wherein suitable settlement in the method for the present invention's proposition with retrieval mode.Therebetween and the foundation that adjusts, selects with quantity or the mode that whether comprises the target patent case.In another embodiment of the present invention, the mode of step 20 decision settlement then is directly to launch to form the settlement from least one target patent case.For example launch formed settlement G1 from target patent case P single order shown in Fig. 8 A, just comprised patent case 303～305 (so-called forward direction the is quoted as proof) three's of patent case 300～302 that target patent case P, all target patent case P are directly quoted (so-called back is to quoting as proof), all direct REFER object patent case P set.Launch formed settlement G2 from target patent case P second order shown in Fig. 8 B,, do single order forward direction and the set of back at least again to the formed patent case 300～327 of expansion just to each patent case 300～305 in the formed settlement G1 of single order.

A kind of in addition embodiment is; Shown in Fig. 8 C; When doing second rank and launch, to each back of target patent case P to quoting patent case 300～302 as proof, include in its at least after the single order to quoting patent case 306,308,312～315 (not quoting case as proof) as proof but do not include its forward direction in; And each forward direction of target patent case P is quoted as proof patent case 303～305, include in its at least before the single order to quoting patent case 318,319,321,323,324,326 and 327 (but not including in) as proof thereafter to quoting case as proof.This embodiment just like the general outside expansion of ripple to form settlement G3.

As stated, no matter adopt which kind of expansion mode, present embodiment can carry out the multistage foundation of launching formed settlement, back as subsequent analysis from least one target patent case.Step 22～25 item were identical with previous embodiment, just do not give unnecessary details at this.

After the centrality index that obtains each patent case; Automatic Evaluation and recognition unit 14 also are used for according to a predetermined threshold value this each centrality index of quoting network as proof being judged; Assert that the pairing patent case of centrality index that reaches this predetermined threshold value is a target data, thereby realize data mining.It can identify this target data in follow-up procedure for displaying especially.

For example, assert in the accompanying drawing shown in Figure 7, be positioned at the target data that is of preceding 10% (0.9-1.0).Perhaps, judge respectively whether each the centrality index that obtains reaches a predetermined threshold value.If assert that it is a target data.

In addition, image conversion unit 15 of the present invention can further be carried out patterned demonstration through display device to quoting network as proof.That is, it is to quote tree as proof that image conversion unit 15 will be quoted network switch as proof, is star tree for example, and shows with graphics mode through GUI.Simultaneously, this graphical quoting as proof in the network of showing, the target data of excavating through abovementioned steps is done special sign.

Concrete, any time after step 20, image conversion unit 15 receives the binding data that the patent case that obtains is analyzed in citation analysis unit 12.

Image conversion unit 15 comprises a node generation module, is used for analyzing the binding data that obtain according to citation analysis unit 12, converts each patent case of wherein being mentioned into patent node data structure.

Image conversion unit 15 also comprises quotes the tree generation module as proof, is used for forming the patent citation tree by node and internodal binding data.Linking data between node for example is to point to pointer or the vernier of child node ID, is the line of a connection two nodes in the patent citation tree.

This patent citation tree comprises that a plurality of node data structures, each node data structure comprise the ID of this node uniqueness of sign, and each node data structure comprises its child node of child node binding sensing.Node child node pointed is the patent that current patent is quoted as proof, and the father node of sensing present node is a patent of quoting current patent as proof.

The effect that this patent citation tree shows can be with reference to shown in the figure 4A.

Simultaneously,, can pass through the mode of high brightness, enlarge font, increase signalment, show that it is the higher patent case of significance level for the target data that obtains through data mining before.Make the user can obtain the target data in the patent group that retrieval obtains intuitively.

The technique effect that the present invention realizes is, shows the citation relations between patent case, and based on this citation relations patent group is obtained the citation relations of each patent between patent case and one of have relative importance or value.

The technique effect that the present invention further realizes is, carries out automatic data mining, obtains target data.Particularly can realize batch processing, automatic localizing objects data, thereby the usefulness of raising data processing to patent case.

The above is merely embodiments of the invention, when can not with the restriction scope of the invention.The equalization of promptly making according to claim of the present invention generally changes and revises, and will not lose main idea of the present invention place, does not also break away from the spirit and scope of the present invention, and the former capital should be regarded as further enforcement situation of the present invention.

Claims

1. a patent data mining system is characterized in that, comprising:

Patent group acquiring unit is used to obtain a patent group;

2. the system of claim 1; It is characterized in that; This citation analysis unit comprises that a search path binding numeration unit or a search path node are to the unit; This search path links the numeration unit and is used for giving weighted value according to search path binding number scale for each binding, and this search path node is used for giving weighted value to method for each binding according to the search path node to the unit.

3. the system of claim 1 is characterized in that, this centrality index is a proper vector centrality.

4. the system of claim 1 is characterized in that, this citation analysis unit further comprises:

The settlement selected cell; Be used for according to an alternative condition; Quote as proof a plurality of settlements that network forms from this and to select one, this alternative condition comprises a patent case quantity or a target patent case, and this settlement selected cell selector is should the settlement of patent case quantity; Perhaps, select to comprise the settlement of this target patent case.

5. system as claimed in claim 4; It is characterized in that; This system also includes an automatic Evaluation and recognition unit; Be used for estimating this centrality index that this target patent case had in a relative evaluation index of this settlement, this relative evaluation index refers to the percentile rank of this centrality index that this target patent case is had.

6. system as claimed in claim 4; It is characterized in that; This system also comprises an automatic Evaluation and recognition unit; Utilize one in this centrality index and this settlement of this target patent case to have the ratio of this centrality index between the patent case of confirming to be worth, calculate the value that this target patent case is had.

7. system as claimed in claim 4 is characterized in that, the set that comprise a plurality of patent cases of this patent group for seeking out in a database according at least one search condition.

8. system as claimed in claim 7; It is characterized in that this patent group acquiring unit further comprises updating block, this updating block judges whether this patent group that retrieves comprises this target patent case; As denying; Then adjust this search condition, produce a patent group again, comprise this target patent case up to the patent group that is produced.

9. system as claimed in claim 4 is characterized in that, this patent group serve as reasons this target patent case at least the single order forward direction quote as proof and launch and launch formed set to quoting as proof behind the single order at least.

10. system as claimed in claim 9 is characterized in that, this target patent case directly or indirectly the back is when launch by a patent case of quoting as proof, and this patent group only comprises after this patent case to the patent case of quoting as proof; Perhaps, when the patent case that the direct or indirect forward direction of this target patent case is quoted as proof was launched, this patent group only comprised the patent case that this patent case forward direction is quoted as proof.

11. a patent data mining method is applied to it is characterized in that in the described system of claim 1, comprises the following step at least:

Utilize patent group acquiring unit to obtain a patent group;