CN104008171A

CN104008171A - Legal database establishing method and legal retrieving service method

Info

Publication number: CN104008171A
Application number: CN201410242810.8A
Authority: CN
Inventors: 刘婕; 张程; 赵晓芳
Original assignee: Institute of Computing Technology of CAS
Current assignee: Institute of Computing Technology of CAS
Priority date: 2014-06-03
Filing date: 2014-06-03
Publication date: 2014-08-27

Abstract

The invention provides a legal database establishing method. The legal database establishing method comprises 1) for a new legal text, splitting the received new legal text by item to obtain corresponding legal item documents and creating corresponding unique identities; 2) segmenting every legal item document, for every word item obtained through segmentation, establishing or updating an unique record corresponding to the word item in a content-based inverted index, wherein every record of the content-based inverted index comprises every legal item document and the corresponding index information of the corresponding word item containing the record; 3) returning to the step 1) to process a next legal text until all the legal texts are processed. The invention also provides a corresponding retrieving service method. The legal database establishing method and the legal retrieving service method can help obtain precise retrieval results of the legal items simply through one retrieval process.

Description

A kind of law databases construction method and legal retrieval method of servicing

Technical field

The present invention relates to computer version information retrieval, specifically, the present invention relates to a kind of law databases construction method and legal retrieval method of servicing.

Background technology

Information retrieval refers to be organized the data of recorded information in a certain way and stores, and finds out process for information about according to user's needs.Utilize information retrieval technique, the knowledge that searches out needs from the data of magnanimity that people can be more prone to, has improved the efficiency of knowledge acquisition.

Legal retrieval system is information retrieval technique to be acted on to a kind of application of laws and regulations text, can help office of the National People's Congress at different levels, Party and government offices, the staff of the law working mechanisms such as law court, procuratorate, lawyer's office, finds required laws and regulations information fast.Meanwhile, legal retrieval system also provides legal retrieval service to society.

Current legal retrieval system, as " the Chinese law regulation search system " of the National People's Congress, " Beijing University's magic weapon " of Peking University etc., be all for laws and regulations in full and title, date, issue department, rules classification, effect rank, the metadata combination information such as ageing retrieve, it is base unit in full that the result for retrieval returning be take laws and regulations.Yet user often needs to find the law that merit may be applicable, so after obtaining result for retrieval, user also needs to search voluntarily further relevant law.

On the other hand, user often expects to find all relevant law relevant to merit, and current legal retrieval is all the exact matching to key word, if key word is not accurate enough, just may there is omission in the result retrieving, the relevant law having may be within result for retrieval scope.So for finding more relevant law, user often needs to attempt using multiple key word or key combination, carries out repeatedly, repeatedly retrieves, and could finally find required a plurality of relevant law entries.Therefore, the convenience of existing legal retrieval is in urgent need to be improved.

Therefore, current in the urgent need to a kind of legal retrieval service plan that can help user to find more quickly required laws and regulations information.

Summary of the invention

Therefore, task of the present invention is to overcome the deficiencies in the prior art, and a kind of legal retrieval service plan that can help user to find more quickly required laws and regulations information is provided.

The invention provides a kind of law databases construction method, comprise the following steps:

1) law databases receives a new Law Text, splits the Law Text receiving by entry, obtains corresponding law entry document and creates corresponding unique identification;

2) each law entry document is carried out to participle, each lexical item for participle gained, in content-based inverted index, set up or upgrade the corresponding unique record of this lexical item, every record of described content-based inverted index includes: in content, occur each law entry document of the corresponding lexical item of this record institute and index information accordingly;

3) get back to step 1) receive next Law Text and process accordingly, until all Law Text are all disposed.

Wherein, described step 2), in, described index information comprises: the inverse document frequency of corresponding lexical item, and corresponding lexical item appear at the word frequency of each law entry document; Wherein, described inverse document frequency is the inverse document frequency of the law entry document based in law databases.

Wherein, described step 2) comprise following sub-step:

21) traversal splits each the law entry document obtaining, and for current law entry document, it is carried out to participle;

22) all lexical items that traversal participle obtains, to each lexical item, calculate current lexical item and appear at the word frequency in described current law entry document, in content-based inverted index, search the record corresponding to described current lexical item, if find the record of the described current lexical item of having deposited, in record, increase the sign of described current law entry document, and the word frequency that occurs of described current lexical item in described current law entry document, and upgrade the inverse document frequency of described current lexical item; If do not find the record of the described current lexical item of having deposited, in the dictionary of described content-based inverted index, increase described current lexical item, increase a new record simultaneously, described new record comprises the inverse document frequency of described current lexical item, the sign of described current law entry document, and the word frequency that occurs in described current law entry document of described current lexical item.

The present invention also provides a kind of legal retrieval method of servicing based on above-mentioned law databases, comprises the following steps:

4) obtain the retrieval vector that acts on content territory;

5), for each keyword in retrieval vector, according to content-based inverted index, find each the law entry document and the corresponding index information that in content, occur this keyword;

6) according to corresponding index information, the law entry document hitting is sorted.

Wherein, described step 5), in, described index information comprises: the inverse document frequency of corresponding lexical item, and corresponding lexical item appear at the word frequency of each law entry document; Wherein, described inverse document frequency is the inverse document frequency of the law entry document based in law databases.

Wherein, described step 6) comprise following sub-step:

61) for step 5) in each law entry document of hitting, obtain the law entry document vector that dimension is consistent with described retrieval vector, each element of described law entry document vector is corresponding to a keyword, the value of each element is according to step 5) inverse document frequency of this keyword of finding, and in the content of this law entry document, occur that the word frequency of this keyword draws;

62) law entry document vector sum is retrieved to vectorial similarity as the retrieval similarity of corresponding law entry document, the law entry document each being hit according to described retrieval similarity sorts.

Wherein, described step 62) in, it is that law entry document vector sum is retrieved vectorial cosine similarity that described law entry document vector sum is retrieved vectorial similarity.

Wherein, described step 6) in, in described law entry document vector, the value of each element is step 5) inverse document frequency of the corresponding keyword of this element that finds, and in the content of this law entry document, there is the product of the word frequency of the corresponding keyword of this element.

Wherein, described law entry document comprises metamessage and content, and described metamessage comprises the title of the affiliated Law Text of law entry, and affiliated chapters and sections and the numbering of law entry in affiliated Law Text.

Wherein, described step 6) also comprise: using the affiliated law of the law entry document hitting as hitting law, the described retrieval similarity of the law entry document hitting according to each, show that each retrieval similarity of hitting law hits law to each and sort, then according to sequencing display, each hits content and the metamessage of each law entry document hitting in law.

Wherein, described legal retrieval method of servicing also comprises step:

7) for each, hit law, the similarity of hitting other law in law and described law databases according to this, searches and shows that this hits the relevant law of law;

Described relevant law is determined according to the similarity between law, wherein, similarity between two laws draws as follows: all law titles are carried out to participle and obtain a series of lexical items, and extract and belong to subject structure in title according to part of speech, the lexical item of predicate structure and object structure, with extracted lexical item constitutive characteristic subspace, all law titles are all converted to the expression form of the lexical item vector on described proper subspace, similarity using the similarity at described proper subspace of two corresponding two lexical item vectors of law title between described two laws.

Wherein, described step 7) in, for each, hit law, show that this hits the incidence relation figure of law and its relevant law, described incidence relation figure comprises: series of points and the limit that is connected each point, this hits a relevant law of law described in each some representative, to hit law or one, shows the similarity between corresponding two laws of two end points on every limit.

Compared with prior art, the present invention has following technique effect:

1, primary retrieval can obtain the result for retrieval that is accurate to law entry.

2, can not only obtain the law entry of mating with retrieve statement, can also further obtain all relevant laws, thereby help user more fully to find all laws relevant to merit, reduce the retrieval difficulty of laws and regulations information.

Accompanying drawing explanation

Below, describe by reference to the accompanying drawings embodiments of the invention in detail, wherein:

Fig. 1 shows the overall flow schematic diagram of one embodiment of the invention;

Fig. 2 shows in one embodiment of the invention to set up take the schematic flow sheet of the law databases that law entry document is storage unit;

Fig. 3 shows the structure example of dictionary and index record table in the inverted index in one embodiment of the invention;

Fig. 4 shows the schematic flow sheet of the retrieval service in one embodiment of the invention;

Fig. 5 shows the schematic flow sheet of the associative search service in one embodiment of the invention;

Fig. 6 shows the incidence relation illustrated example of hitting law and relevant law thereof in one embodiment of the invention.

Embodiment

A kind of legal retrieval method of servicing is provided according to one embodiment of present invention, and as shown in Figure 1, it comprises three parts.First is: set up and take the law databases that law entry document is storage unit and corresponding inverted index.Second portion is: receive retrieve statement, based on law databases and corresponding inverted index, return to the result for retrieval that is accurate to law entry.Third part is: the result for retrieval based on second portion, and further search the relevant law of the affiliated law of result for retrieval, and found relevant law is added to result for retrieval.Below these three parts are described in detail respectively.

One, set up and take the law databases that law entry document is storage unit and corresponding inverted index.In prior art, conventionally by whole law, form a law documentation, law databases be take law documentation conventionally as unit storage law data.And in the present embodiment, the law entry document of take in law databases is basic unit of storage.Be that each law entry forms a document separately.For ease of understanding, take the < < National People's Congress of the People's Republic of China (PRC) and this law documentation of the election law > > of local people's congress at all levels below to describe as example.In this law documentation, its text mainly comprises: title, note, catalogue, text.

Fig. 2 shows in one embodiment of the invention to set up take the schematic flow sheet of the law databases that law entry document is storage unit, with reference to figure 2, law documentation is inputted to law databases successively, for each law documentation, carries out the following step 11 to 14.

Step 11: law documentation structure is identified and split.By predefined rule, the structural information of identification law documentation, as a piece of writing, chapter, joint; Further identify and locate each entry in Law Text, and it is split one by one by entry.Wherein, text is split as to N subdocument by entry, the law documentation of the < < National People's Congress of the People's Republic of China (PRC) and the election law > > of local people's congress at all levels of take is example, its entry has 66, so, be split as 66 subdocuments.Wherein, each subdocument includes: legal provision content, affiliated law title and the hierarchical structure in affiliated law thereof.For example: the < < National People's Congress of the People's Republic of China (PRC) and local people's congress at all levels's corresponding subdocument of election law > > article one store legal provision content: according to the 12 of Chinese People's Political Consultative Conference's common program, the National People's Congress of the People's Republic of China (PRC) and local people's congress at all levels produce it by the various nationalities people by general election method; Affiliated law title: the National People's Congress of the People's Republic of China (PRC) and local people's congress at all levels's election law; Hierarchical structure in affiliated law: chapter 1 article one.

Step 12: the subdocument splitting (being law entry document) is set up to index.After splitting, using each entry as a subdocument, on content territory, carry out participle (the content part of subdocument being carried out to participle), each lexical item obtaining for participle (vocabulary repeating is considered same lexical item), add up its word frequency (tf) and inverse document frequency (idf), and at this Foundation inverted index.Inverted index is divided into dictionary and arranges record sheet two parts.Fig. 3 shows the structure example of dictionary and index record table in an inverted index.As shown in Figure 3, for a record, by a lexical item, as its unique identification, be stored in the dictionary of inverted index.Simultaneously, in dictionary, also the link of corresponding record and the inverse document frequency of the lexical item of this record in law databases in record sheet are arranged in storage, should be noted that this inverse document frequency is the inverse document frequency that all law entry documents based in law databases calculate, but not the common inverse document frequency calculating based on law documentation.In arranging record sheet, every record is stored with the form of chained list, the law entry that has comprised this lexical item that occurs, as arranging in record sheet corresponding to the record of lexical item 1 of Fig. 3, its four nodes represent respectively law entry document 1,2,3,4, this represents law entry document 1, in 2,3,4, all there is lexical item 1, record corresponding to lexical item 2, two node represents that respectively law entry document 5,6 represents all to have occurred lexical item 2 in law entry document 5,6.Wherein, each represents that the node of law entry document all records the id of law entry, and lexical item appears at the frequency in this law entry subdocument, and lexical item appears at other information such as position in this law entry subdocument.

Step 13: the title to law, and issuing time, index is set up in the out of Memory territories such as body release.Wherein, title is carried out to participle and then set up corresponding inverted index, participle is not carried out in other each territory, but using the whole content in each territory as a lexical item.For example: when body release is Central People's Government Committee, in this inverted index, " Central People's Government Committee " is whole as a lexical item.

Step 14: by content territory, title field and issuing time, a plurality of inverted indexs in other metadata information territory such as body release are stored in system with the form of file.

Two, receive retrieve statement, based on law databases and corresponding inverted index, return to the result for retrieval that is accurate to law entry.The present embodiment can provide the service of multiple domain combined retrieval.Meanwhile, it can be classified as a class by many relevant entrys that belong to same portion laws and regulations, and combination shows.As a rule, retrieval service is divided simple and the senior two kinds of patterns of can be.Simple mode is to retrieve identical retrieve statement on title and content territory, and under this pattern, user directly inputs retrieve statement.Fine mode can be supported the screening of enumerating for metadata by metadata information territory, and under this pattern, user need to specify needs the territory of retrieval and on this territory, inputs retrieve statement or select enumerated value.For example " content: consumption interest and right protection & title: Protection Code & body release (enumerated value): the National People's Congress ".Retrieval service is returned to entry contents and its metadata information that retrieval is relevant.Retrieve statement can be vocabulary (as " economy "), lexical set (as " economic policy ") or phrase (as " economic policy ").For different information fields, conventionally have different retrieval service modes, for example; to content territory and title field; retrieve statement need to carry out participle conventionally, and the retrieve statement in other metamessage territory is not done word segmentation processing, the directly keyword using retrieve statement as corresponding metamessage territory.The retrieval that is accurate to law entry of the present embodiment mainly refers in the retrieval service that acts on content territory, therefore hereinafter mainly to acting on the retrieval service in content territory, is described, and the part that all the other and purport of the present invention are irrelevant, repeats no more herein.

Fig. 4 shows the schematic flow sheet of the retrieval service in one embodiment of the invention, and with reference to figure 4, retrieval service comprises the following steps 21 to 24.

Step 21: receive the retrieve statement that acts on content territory.As mentioned before, retrieve statement can be vocabulary (as " economy "), lexical set (as " economic policy ") or phrase (as " economic policy ").

Step 22: retrieve statement is carried out to participle, obtain corresponding one or more search key, form retrieval vector.

Step 23: on content territory, for each keyword, the inverted index based on this territory, finds the inverse document frequency of this keyword, occurs each law entry document of this keyword, and the word frequency of this keyword in corresponding law entry document.In inverted index, store the index record that belongs to all lexical items in this territory in law databases, found the index of the lexical item that keyword is corresponding, just can obtain required information.For example, when keyword is " economy " and " policy ", at inverted index, find respectively the index record of lexical item " economy " and the index record of " policy ", so just can from the index record of " economy ", obtain the inverse document frequency of lexical item " economy ", each the law entry document that contains " economy ", and " economy " word frequency of occurring in each law entry document.Similarly, from the index record of " policy ", obtain the inverse document frequency of lexical item " policy ", each the law entry document that contains " policy ", and " policy " word frequency of occurring in each law entry document.Now, the law entry lists of documents of lexical item " economy " and " policy " is got to union, just obtained the documents relevant to retrieval all on this territory.If selected, be advanced search pattern, on all territories, the one or more keywords based on correspondence are retrieved.

Step 24: calculate the retrieval degree of correlation of each the law entry document finding, according to the retrieval degree of correlation, each law entry document finding is sorted, wherein retrieve the degree of correlation larger, sort more forward.Then using the information of each law entry document of finding described in after sequence as result for retrieval.Wherein, for the retrieval that only acts on content territory, based on step 23, obtain the law entry document vector that dimension is consistent with described retrieval vector, each element of described law entry document vector is corresponding to a keyword, the inverse document frequency of this keyword that the value of each element finds according to step 23, and in the content of this law entry document, occur that the word frequency of this keyword draws.Can directly law entry document vector sum be retrieved to vectorial similarity as the retrieval similarity in content territory of corresponding law entry document, the law entry document each being hit according to described retrieval similarity sorts.So just can present the integrated retrieval result of retrieve statement on content territory.It is that law entry document vector sum is retrieved vectorial cosine similarity that described law entry document vector sum is retrieved vectorial similarity.In described law entry document vector, the value of each element is in the inverse document frequency of the corresponding keyword of this element that finds of step 23 and the content of this law entry document, to occur the product of the word frequency of the corresponding keyword of this element.

And for the retrieval of fine mode, the retrieval degree of correlation of a law entry document be this law entry document corresponding to the linear weighted function of the degree of correlation in each territory and, law entry document equals under the vector space model of text corresponding to the degree of correlation in a territory, the cosine similarity of the vector representation of entry document on this territory and the vector representation of retrieval of content (i.e. retrieval vector).In the vector representation of entry document, use the inverse document frequency of lexical item and the product of the word frequency of this lexical item in this law entry document as the numerical value of every one dimension, in the vector representation of retrieval of content, only use the word frequency of lexical item as the numerical value of every one dimension.So just can present the integrated retrieval result of retrieve statement, and its sequence considered each territory, and the Different Effects of a plurality of keywords on each territory.

Further, in one embodiment, according to the affiliated law of the law entry document finding in step 24 (the law entry document hitting), take law as basis law entry document is integrated to classification.Calculate the retrieval degree of correlation of whole Law Text the retrieval degree of correlation based on whole Law Text and participate in retrieval relevancy ranking, the retrieval degree of correlation of whole Law Text equals the retrieval degree of correlation sum of found each law entry document that belongs to it.The item list so just retrieval being obtained is integrated classification according to law, and on the basis of original entry degree of correlation, recalculate the relevance degree of law, rearrangement, reach and take law as unit centralized displaying, and only list relevant entry in this law but not in full, and the entry in law by the degree of correlation orderly present effect.This scheme can make result for retrieval more have logicality, more attractive in appearance and be convenient to user and browse.

Three, the result for retrieval based on second portion, further searches the relevant law of the affiliated law of result for retrieval, and found relevant law is added to result for retrieval.This part is in fact a kind of associative search service, it is for this text with certain normalized structure of laws and regulations, carry out the calculating of the degree of association, and extract associated graphical description, thereby show that more intuitively laws and regulations are directly associated, so that user consults the information being associated with result for retrieval.

Fig. 5 shows the schematic flow sheet of the associative search service in one embodiment of the invention, and with reference to figure 5, associative search service comprises the following steps 31 to 34.

Step 31: legal characteristics is extracted.Because Law Text has certain normalized structure, particularly its name, has shown field and theme that laws and regulations are concerned about to a great extent.Therefore, can obtain legal subject matter by its title is analyzed, and be used the vector representation under proper vector subspace.Wherein, according to the analysis to laws and regulations title, the syntactic structure of its title is relatively simple, and the content that laws and regulations are mainly expressed contained substantially in the subject in title, object (noun part) and predicate (verb part).By participle and part of speech, analyze, can be easy to find subject and predicate, object component in title, and be extracted as the feature that represents title.

Object lesson below in conjunction with concrete three pieces of law titles describes.First pass through Chinese word segmentation, the title of law is split into lexical item one by one.Wherein, for the title of law 1: Income Tax Law of The People's Republic of China for Enterprises with Foreign Investment and Foreign Enterprises, its word segmentation result is:

Income Tax Law of The People's Republic of China for Enterprises with Foreign Investment and Foreign Enterprises

Title for law 2: merge the regulation of resident enterprise about foreign investor, its word segmentation result

For:

About foreign investor, merge the regulation of resident enterprise

Title for law 3: about the regulation of electronics patented claim, its word segmentation result is:

Regulation about electronics patented claim

The vector space of these three pieces of law title compositions is the set that all lexical items form, specific as follows: merge,, electronics, method, about, regulation, and, domestic, enterprise, application, income tax, investment, investor, foreign country, foreign trader, the People's Republic of China (PRC), patent }.

By every piece of law title, all with the vector representation that belongs to above-mentioned vector space, in vector, each element represents a lexical item, and the value of this element represents corresponding word frequency.

The vector representation of three pieces of law titles is as follows particularly:

Further, in order to get rid of the interference with the irrelevant lexical item of legal subject matter, can also after to law title participle, carry out part of speech identification, find subject and predicate, object component in title, and be extracted as the feature that represents title, and then constitutive characteristic vector.Wherein, the fixedly suffix of law title, regulation for example, notice, methods etc., also can be considered the lexical item irrelevant with legal subject matter, and the irrelevant suffix of these and content is also removed.

In example, for the title of law 1: Income Tax Law of The People's Republic of China for Enterprises with Foreign Investment and Foreign Enterprises, word segmentation result is:

The People's Republic of China (PRC)/noun foreign trader/noun investment/verb enterprise/noun and/conjunction foreign country/noun enterprise/noun income tax/name morphology/noun

Title for law 2: merge the regulation of resident enterprise about foreign investor, word segmentation result is:

About/preposition foreign country/noun investor/noun merger/verb domestic/place word enterprise/noun/auxiliary word regulation/noun

Title for law 3: about the regulation of electronics patented claim, its word segmentation result: about/preposition electronics/noun patent/noun application/verb/auxiliary word regulation/noun

Now, the proper subspace of three of acquisition pieces of titles is:

{ electronics, enterprise, income tax, investment, investor, foreign country, foreign trader, the People's Republic of China (PRC), patent }

The vector representation of three pieces of laws in proper subspace is as follows:

Electronics	Enterprise	Income tax	Investment	Investor	Foreign country	Foreign trader	The People's Republic of China (PRC)	Patent
									0	2	1	1	0	1	1	1	0
0	1	0	0	1	1	0	0	0
									1	0	0	0	0	0	0	0	1

Step 32: law similarity is calculated.As described above, by feature extraction, the title of laws and regulations can be described as to the lexical item vector in proper subspace.Use keyword vector space model rule title, but space constraint is the feature lexical item of all extraction.Now, similarity that can be using the title similarity of laws and regulations in proper subspace as law.

In one embodiment, the similarity of law is the cosine similarity that two pieces of law titles calculate on proper vector subspace.

CosSimilarity (\overset{&RightArrow;}{A}, \overset{&RightArrow;}{B}) = \frac{\overset{&RightArrow;}{A} \cdot \overset{&RightArrow;}{B}}{| \overset{&RightArrow;}{A} | | \overset{&RightArrow;}{B} |} = \frac{Σ (a_{i} b_{i})}{\sqrt{Σ {(a_{i})}^{2}} \sqrt{Σ {(b_{i})}^{2}}}

For law 1, law 2, law 3 in example above, similarity result of calculation is as follows:

CosSimilarity (law 1, law 3)=0

CosSimilarity (law 2, law 3)=0

Step 33: the similarity based between law, return to the law being associated with law in second portion result for retrieval.In order to reduce actual calculated amount, and avoid generating the too small incidence relation of the degree of association, before extracting incidence relation, first law is carried out to cluster, generate the laws and regulations set that several inside has larger similarity.Wherein, utilize the cosine similarity calculating to carry out cluster to law, adopt hierarchy clustering method, the threshold value based on default, it is a class that the larger law of similarity is gathered.For example, it is a class that law 1 and law 2 will be gathered, and law 3 belongs to another kind of.Record similarity value between any two in cluster, so that sort when returning to associative search result.The extraction of incidence relation is only carried out in cluster inside.In an example, according to key word of the inquiry, the law that system has obtained N portion coupling, as retrieval " income tax ", will return to " Income Tax Law of The People's Republic of China for Enterprises with Foreign Investment and Foreign Enterprises ".Simultaneously, system, by the pre-stored law association cluster result of retrieval, obtains the cluster at law 1 place, obtains the front K portion's relative laws (sorting with similarity value) that meets Threshold from this cluster, as law 2, the associative search result as law 1 is returned.3 of laws, because not belonging to same cluster, can not return as the associative search result of law 1.

Further, when returning to association results, as a plurality of laws of associative search result, can sort according to the similarity between the law in it and second portion result for retrieval.Forward with the relative laws sequence that the similarity of law in second portion result for retrieval is larger.

Meanwhile, for the relative laws that proposes to obtain, according to pre-stored law similarity value between any two, the graph structure that generates incidence relation is described: G (V, E).Point (V) representative comprises law in result for retrieval and the set of its relative laws, limit (E) is representing between two nodes (two laws) of its connection and is having incidence relation, the length on limit is short, and explanation relation is tightr, and the similarity of two laws is larger.On every limit, can also further show the similarity numerical value between corresponding two laws of two end points.

Fig. 6 shows the incidence relation illustrated example of hitting law and relevant law thereof in one embodiment of the invention.As shown in Figure 6, hit law and relative laws 2 similarities maximums, minimum with relative laws 3 similarities, and, between relative laws 1 and relative laws 2, also there is similarity.

In above-described embodiment, based on law entry document and corresponding inverted index, build brand-new law databases, make primary retrieval can obtain the result for retrieval that is accurate to law entry.And above-described embodiment can not only obtain the law entry of mating with retrieve statement, can also further obtain all relevant laws.And in prior art, for finding more relevant law, user often needs to attempt using multiple key word or key combination, carry out repeatedly, repeatedly retrieve, could finally find required a plurality of relevant law entries.Therefore, the present invention can more conveniently help user to find all laws relevant to merit all sidedly, has reduced the retrieval difficulty of laws and regulations information.

The foregoing is only the schematic embodiment of the present invention, not in order to limit scope of the present invention.Any those skilled in the art, not departing from equivalent variations, modification and the combination of doing under the prerequisite of design of the present invention and principle, all should belong to the scope of protection of the invention.

Claims

1. a law databases construction method, comprises the following steps:

1) for a new Law Text, by entry, split the Law Text receiving, obtain corresponding law entry document and create corresponding unique identification;

3) get back to step 1) process next Law Text until all Law Text are all disposed.

2. law databases construction method according to claim 1, is characterized in that, described step 2) in, described index information comprises: the inverse document frequency of corresponding lexical item, and corresponding lexical item appear at the word frequency of each law entry document; Wherein, described inverse document frequency is the inverse document frequency of the law entry document based in law databases.

3. law databases construction method according to claim 2, is characterized in that, described step 2) comprise following sub-step:

4. the legal retrieval method of servicing based on law databases construction method described in claim 1, comprises the following steps:

4) obtain the retrieval vector that acts on content territory;

5. legal retrieval method of servicing according to claim 4, is characterized in that, described step 5) in, described index information comprises: the inverse document frequency of corresponding lexical item, and corresponding lexical item appear at the word frequency of each law entry document; Wherein, described inverse document frequency is the inverse document frequency of the law entry document based in law databases.

6. legal retrieval method of servicing according to claim 5, is characterized in that, described step 6) comprise following sub-step:

62) law entry document vector sum is retrieved vectorial similarity as corresponding law entry document the retrieval similarity in content territory, the law entry document each being hit according to described retrieval similarity sorts.

7. legal retrieval method of servicing according to claim 6, is characterized in that, described step 62) in, it is that law entry document vector sum is retrieved vectorial cosine similarity that described law entry document vector sum is retrieved vectorial similarity.

8. legal retrieval method of servicing according to claim 7, it is characterized in that, described step 6) in, in described law entry document vector, the value of each element is step 5) inverse document frequency of the corresponding keyword of this element that finds, and in the content of this law entry document, there is the product of the word frequency of the corresponding keyword of this element.

9. legal retrieval method of servicing according to claim 6, it is characterized in that, described law entry document comprises metamessage and content, and described metamessage comprises the title of the affiliated Law Text of law entry, and affiliated chapters and sections and the numbering of law entry in affiliated Law Text.

10. legal retrieval method of servicing according to claim 9, it is characterized in that, described step 6) also comprise: using the affiliated law of the law entry document hitting as hitting law, the described retrieval similarity of the law entry document hitting according to each, show that each retrieval similarity of hitting law hits law to each and sort, then according to sequencing display, each hits content and the metamessage of each law entry document hitting in law.

11. legal retrieval method of servicing according to claim 10, is characterized in that, described legal retrieval method of servicing also comprises step:

12. legal retrieval method of servicing according to claim 11, it is characterized in that, described step 7) in, for each, hit law, show that this hits the incidence relation figure of law and its relevant law, described incidence relation figure comprises: series of points and the limit that is connected each point, this hits a relevant law of law to hit law or one described in each some representative, shows the similarity between corresponding two laws of two end points on every limit.