WO2016026359A1 - Computer-based method and device for parsing natural language syntactic structures - Google Patents

Computer-based method and device for parsing natural language syntactic structures Download PDF

Info

Publication number
WO2016026359A1
WO2016026359A1 PCT/CN2015/083760 CN2015083760W WO2016026359A1 WO 2016026359 A1 WO2016026359 A1 WO 2016026359A1 CN 2015083760 W CN2015083760 W CN 2015083760W WO 2016026359 A1 WO2016026359 A1 WO 2016026359A1
Authority
WO
WIPO (PCT)
Prior art keywords
vector
syntax
syntactic
vectors
unit
Prior art date
Application number
PCT/CN2015/083760
Other languages
French (fr)
Chinese (zh)
Inventor
秦一男
Original Assignee
秦一男
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 秦一男 filed Critical 秦一男
Publication of WO2016026359A1 publication Critical patent/WO2016026359A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis

Definitions

  • the present invention relates to the field of computer data processing, and in particular to a computer-based natural language syntax structure parsing method and apparatus.
  • Natural language processing is an important direction in the field of computer science and artificial intelligence. It studies various theories and methods that enable effective communication between humans and computers using natural language.
  • Syntactic structure analysis is an important aspect of natural language processing. It automatically divides the sentence components of natural language sentences by computer to assist in the further processing of sentences.
  • PCFG Probabilistic Context Free Grammars
  • the syntactic parsing result with the highest probability is selected as the final syntactic structure.
  • the present invention provides a computer-based natural language syntax structure analysis method and apparatus, which has unique ideas, ingenious methods, and detailed argumentation, and fully utilizes the laws of mathematics and computer science, and the method has high accuracy.
  • the amount of calculation is very large and has high technical difficulty.
  • the invention provides a computer-based natural language syntax structure parsing method, comprising:
  • the possible value of the guide element is one of a parallel related word unit or a dependent related word unit whose number is smaller than the corresponding predicate verb unit number, or a parallel related word unit whose number is smaller than the corresponding predicate verb unit number and one adjacent thereto
  • the possible value of the subject element is one of the noun pronoun units whose number is smaller than the corresponding predicate verb unit number, or the number of the largest word unit is smaller than the juxtaposition included in the total parallel noun pronoun combination vector family of the corresponding predicate verb unit number.
  • the predicate element is a corresponding predicate verb unit
  • the possible value of the object element is one of the noun pronoun units whose number is greater than the corresponding predicate verb unit number and less than the adjacent predicate verb unit number, or the number of the smallest word unit is greater than the corresponding predicate verb unit number.
  • S5 includes the following operations in order, excluding the syntactic structure that does not meet the conditions may be solved:
  • the possible matrix solution may be excluded from the syntax structure
  • any of the possible matrix solutions if there is a syntax vector that has no substitution relationship with other syntax vectors, perform an insertion operation to obtain a possible syntax parsing structure corresponding to all the possible matrix solutions, and verify Whether the statement obtained according to the possible syntax parsing structure is identical to the pre-processed statement, further comprising:
  • S5.5.2 take a second type of syntax vector Mark one by one according to the predetermined direction The order value of each syntax element in the message; after appending the order value of the syntax element, take any The i-th syntax element in the construct, only a unique gap is constructed on the first side of the syntax element; after the void, take a syntax vector Second type of syntax vector Syntactic vector in the form of overall insertion Insert the constructed vacancy, and then generate a new syntax vector, record this new syntax vector as The syntactic vectors obtained by inserting the whole into space are collectively referred to as the third type of syntax vector;
  • the third type of syntax vector Pair vector from the predetermined direction The first syntactic element on the first side starts into the vector Vector contained in Each of the syntax elements up to the first syntactic element on the second side, all of which are labeled with a sequence value; Vector contained in The element on the first side, without the order value; the vector The first syntax element on the second side is marked as Will be vectored as described above.
  • the syntactic vector part of the annotation denoted as the iris syntax vector
  • Each syntax element in the label is labeled with a sequential value; after the order value of the syntax element is annotated, take one The tth syntax element in the construct, constructing a unique gap on the first side of the syntax element; after the void, taking an unused second type of syntax vector The vector is inserted as a whole Insert the previously constructed gap and generate a new vector, then the new vector is recorded as
  • S2 includes generating a vector family of parallel noun pronouns:
  • the two noun pronoun units are used as a parallel noun pronoun combination vector, and the parallel noun pronoun combination vector is retained;
  • S2.4 selects the largest number of word units contained in all noun pronoun combination vectors in each noun pronoun combination vector family, as the largest word unit of the noun pronoun combination vector family, for use in subsequent generation of the subject;
  • the word unit with the lowest number included in all noun pronoun combination vectors is used as the smallest unit of the noun pronoun combination vector family, and is used for subsequent generation of the object.
  • generating corresponding subject elements includes:
  • the possible value of the subject element is one of the noun pronoun units whose number is smaller than the corresponding predicate verb unit number, or the number of the largest word unit is smaller than the corresponding One of the parallel noun pronoun combination vectors contained in the vector of the predicate verb unit number, or an empty unit.
  • the possible value of the subject element is one of the noun pronoun units whose number is smaller than the corresponding predicate verb unit number, or the number of the largest word unit is smaller than the corresponding predicate
  • the verb unit number is one of the collocated noun pronoun combination vectors contained in the collocation noun pronoun combination vector family, or one of the syntactic vowel units corresponding to the predicate verb unit, or an empty unit.
  • generating corresponding object elements includes:
  • the possible value of the object element is one of the noun pronoun units whose number is greater than the corresponding predicate verb unit number, or the number of the smallest word unit is greater than the corresponding number.
  • the possible value of the object element is a noun pronoun unit whose number is greater than the corresponding predicate verb unit number and is smaller than the adjacent predicate verb unit number.
  • One of the collocated noun pronoun combination vectors included in one of the collocation noun pronoun combination vector numbers, or one of the smallest word units, is greater than the corresponding predicate verb unit number and less than the adjacent predicate verb unit number.
  • the possible matrix solution may be replaced by a possible linear expression solution with a syntax structure
  • the syntactic structure may be equivalent to a linear expression solution of the syntactic structure
  • the syntactic structure may be a linear expression solution comprising a syntactic vector expression arranged in order of predicate verb unit numbers; each of the syntactic vector expressions is a guide element, a subject element, a predicate element, an object of a corresponding syntax vector An expression in which elements are added one by one in order.
  • the method further includes:
  • Each syntax vector and corresponding syntax structure relationship in the syntax structure analysis result are displayed in a human-computer interaction interface by a tree structure.
  • the invention also provides an apparatus for analyzing a natural language syntax structure based on a computer, comprising:
  • a reading component configured to read a pre-processed statement data structure to be parsed, wherein the pre-processed statement data structure includes only a parallel-related word unit, a subordinate-related word unit, a predicate verb unit, a noun pronoun unit, and Each word unit is numbered in the order in the preprocessed statement, and the type is marked;
  • An element generating component configured to generate a corresponding guide element, a subject element, a predicate element, and an object element for each predicate verb unit;
  • the possible value of the guide element is one of a parallel related word unit or a dependent related word unit whose number is smaller than the corresponding predicate verb unit number, or a parallel related word unit whose number is smaller than the corresponding predicate verb unit number and one of them
  • the possible value of the subject element is one of the noun pronoun units whose number is smaller than the corresponding predicate verb unit number, or the number of the largest word unit is smaller than the juxtaposition included in the total parallel noun pronoun combination vector family of the corresponding predicate verb unit number.
  • the predicate element is a corresponding predicate verb unit
  • the possible value of the object element is one of the noun pronoun units whose number is greater than the corresponding predicate verb unit number and less than the adjacent predicate verb unit number, or the number of the smallest word unit is greater than the corresponding predicate verb unit number.
  • a vector generating component configured to obtain all possible values of a syntax vector corresponding to each predicate verb unit according to possible values of the guide element, the subject element, the predicate element, and the object element, where the syntax vector includes a guide element , subject elements, predicate elements, and object elements;
  • a matrix generating component configured to generate at least one syntax structure possible matrix solution according to all possible values of all syntax vectors, wherein the syntax structure may be composed of a syntax vector arranged according to a predicate verb unit number order;
  • the solving component excludes a possible syntactic structure solution by the following module operation:
  • a first exclusion module if there is a sequence value that does not appear in the possible matrix solution of the syntax structure, the possible matrix solution is excluded from the syntax structure;
  • the second exclusion module excludes the possible matrix solution if the same sequence value appears in the different syntax vectors or the same syntax vector appears;
  • the syntactic vectors having mutual substitution relations with other syntax vectors are all equally substituted, and if the cross-contradictions of the two syntax vectors appear after the equal-substitution, Excluding the syntactic structure possible matrix solution;
  • a fifth exclusion module in any one of the possible matrix solutions, if there is a syntax vector that has no substitution relationship with other syntax vectors, performing an interpolation operation to obtain a possible syntax parsing structure corresponding to all the possible matrix solutions, and Verification of whether the statement obtained according to the possible syntax parsing structure is identical to the preprocessed statement, further comprising:
  • the first sub-module first performs an equal substitution of the syntactic vectors in the possible matrix solutions with the substitution relationship between them, thereby transforming the possible matrix solutions into a set of syntactic vectors without substitution relations between them.
  • the syntactic vector in the possible matrix solution is called the first kind of syntactic vector, and the transformed syntactic vector will be transformed.
  • the second sub-module taking a second type of syntax vector Mark one by one according to the predetermined direction
  • the third type of syntax vector Pair vector from the predetermined direction
  • the first syntactic element on the first side starts into the vector Vector contained in Each of the syntax elements up to the first syntactic element on the second side, all of which are labeled with a sequence value; Vector contained in The element on the first side, not the order value; the vector The first syntax element on the second side is marked as Will be vectored as described above
  • the syntactic vector part of the annotation denoted as the iris syntax vector
  • Each syntax element in the label is labeled with a sequential value; after the order value of the syntax element is annotated, take one The tth syntax element in the construct, constructing a unique gap on the first side of the syntax element; after the void, taking an unused second type of syntax vector The vector is inserted as a whole Insert the previously constructed gap and generate a new vector, then the new vector is recorded as
  • the fourth sub-module repeats the operation of the third sub-module, and each time the last nulling and emptying step ends, the third type of syntactic vector obtained through the last emptying and emptying steps is made for the next time. Empty and insert operations until all second type of syntax vectors will be After all the insertions are completed, a third type of syntax vector of a single line is finally obtained, and the finally obtained third type of syntax vector is called a final single line vector;
  • a fifth submodule if there are two position reversal order values in all of the final single row vectors corresponding to a possible syntactic parsing structure, the possible syntactic parsing structure is excluded;
  • the sixth sub-module repeatedly calls the operations of the second sub-module to the fifth sub-module until all possible syntactic parsing structures are traversed.
  • the result display component displays the syntax vector and the corresponding syntax structure relationship in the syntax structure analysis result on the human-computer interaction interface by using a tree structure.
  • FIG. 1 is a flow chart of a method for analyzing a computer-based natural language syntax structure according to an embodiment of the present invention
  • FIG. 2 is a schematic diagram of an apparatus for analyzing a computer-based natural language syntax structure according to an embodiment of the present invention.
  • natural language is a free unitary semigroup on vocabulary and punctuation.
  • the following is explained by taking English as an example, but those skilled in the art will readily understand that the method of the present invention is also applicable to other natural languages.
  • the symbol string on A is adjacency of the elements in A, and can be repeated in the adjacency to form a finite-length linear array.
  • the symbol string acbaab can be formed from the set ⁇ a, b, c ⁇ .
  • This string of symbols contains three occurrences of a, two occurrences of b, and one occurrence of c, which is different from the symbol string acaabb. Although each symbol appears the same number of times, their order is different. It can be seen that the symbol string is ordered.
  • a symbol string of length 0 is a string of 0 symbols, denoted as e.
  • the symbol string of length n on A is a mapping from the natural number set N to A: f: N ⁇ A.
  • a new symbol string by adjacency. For example, at the right end of the symbol string abac adjacent to the symbol string bbac, a new symbol string abacbbac is formed.
  • contiguous operation referred to as contiguous.
  • ⁇ (1,x 1 ), (2,x 2 ), (3,x 3 ), . . . , (n-1,x n-1 ), (n,x n ) ⁇ ;
  • ⁇ (1, y 1 ), (2, y 2 ), (3, y 3 ), ..., (m-1, y m-1 ), (m, y m ) ⁇ ;
  • connection between ⁇ and ⁇ is: ⁇ . It is of length n+m and consists of the set ⁇ (1,x 1 ),(2,x 2 ),(3,x 3 ),...,(n-1,x n-1 ),(n,x n ), (n+1, y 1 ), (n+2, y 2 ), ..., (n+m, y m ) ⁇ are given symbol strings. Then, the contig is a binary operation defined on the symbol string, and the result of the operation is to get a new symbol string.
  • connection between ⁇ and ⁇ can also omit the contiguous mark ⁇ , which is simplified as: ⁇ ⁇ .
  • a is said to be a word unit consisting of elements in A, if and only if, b 1 , b 2 , ..., b m ⁇ A.
  • a unique unit of word of length 0 is called an empty unit and is denoted as e.
  • the algebraic system (A s , ⁇ , e) is a free monoid on the English word and punctuation set A.
  • the word units are arranged in order according to their order in the sentence, the subscripts are sequentially numbered, and ⁇ ( ⁇ ) is the number of the word unit ⁇ in the sentence S.
  • N is a natural number set
  • Part A2 defines a partial order relationship
  • ⁇ A s For any word unit ⁇ in A s , ⁇ A s , called ⁇ ⁇ ⁇ , if and only if ⁇ , ⁇ number ⁇ ( ⁇ ), ⁇ ( ⁇ ) satisfies: ⁇ ( ⁇ ) ⁇ ( ⁇ ) .
  • the binary relationship ⁇ ⁇ is strictly partial order relationship.
  • Part A3 defines partial addition and syntactic order values
  • the sequence number of any given continuous string ⁇ is ⁇ ( ⁇ ), and ⁇ ( ⁇ ) is called the left-to-right order value of ⁇ . That is, the syntax element ⁇ in one original sentence S is given, and the syntax order value of the syntax element ⁇ in the original sentence S is denoted as ⁇ ( ⁇ ).
  • the word unit a i constituting the sentence is recognized as a constant.
  • the word unit a i has its language attribute.
  • the word units constituting the core sentence structure can be divided into four types: a parallel related word unit, a dependent related word unit, a predicate verb unit, and a noun pronoun unit.
  • Each word unit includes at least one natural language vocabulary, which may be a word, a phrase of a particular structure, or a juxtaposition of multiple synonyms.
  • the side-by-side related word unit may be a parallel conjunction of the parallel sentence and the parallel syntax component and, but, or, so, yet.
  • a dependent related word unit it can be a connected pronoun of a leading clause or a connecting phrase of a leading adverb and a guiding clause.
  • a typical guiding word the following are listed: that, what, which, who, who, wherever, when, whoe, where, when ,why,how,whoever,whichever,while,whether,because,before,after,whatever,weverever,as,if,once,until,though,unless,although,no matter what,no matter who,no matter whom,no Matter which, in that, in order that, as though, as if, even though, even if, so that, etc.
  • It mainly includes: a word unit that serves as a guiding clause by a word, a related word unit that serves as a guiding clause by a phrase, and a related word unit that connects a parallel sentence and a parallel sentence.
  • predicate verb unit it can also be a verb or a verb phrase, for example, can do, do.
  • the predicate is defined as the main action language in a natural sentence in English.
  • the structure usually consists of two parts: the auxiliary verb + the real verb (except the main table structure).
  • the format requirements for predicate states and voices are defined by the formula of computational linguistics as follows:
  • noun pronoun unit it can be: a pure noun phrase (noun phrase not included in the prepositional phrase), a nominalized verb phrase (nominalized verb phrase definition: having the nature of a noun, can act as a subject or an object)
  • Verb phrases of nominal syntactic components including: indefinite phrases and gerund phrases, and pronouns that can be used alone. Examples of noun pronouns are as follows: food, wolf, the men, me, it, this, to do, etc.
  • the nominal verb phrase has a format requirement, and the formula for computational linguistics is defined as follows:
  • Predicate verb unit k The order number of the predicate verb units currently being processed Lead Subordinate unit NPI Pure noun unit Conj Parallel word unit VNP Noun-like verb unit NOMP Subject pronoun unit OBJP Binger pronoun unit NP General term for noun pronoun unit
  • the set of word units has the following relationship:
  • ⁇ NP ⁇ ⁇ NPI ⁇ VNP ⁇ NOMP ⁇ OBJP ⁇ .
  • Part B2 defines important concepts
  • a clause is a simple sentence, that is, the most basic sentence of natural language.
  • a clause is a set of subject-predicate collocation structure.
  • the above three types of word units constitute the backbone of natural language sentence clauses, wherein the predicate verb unit acts as a predicate, and the noun pronoun unit acts as a subject or object.
  • the variables are defined as x, y, z, where x is the leader element, y is the subject element, z is the object element, and at the same time, r is the predicate element, then the subject-predicate structure in each statement can Expressed as:
  • ⁇ , ⁇ , ⁇ , ⁇ respectively represent any component or punctuation other than x, y, r, z, referred to as impurities, and the impurities can be removed by the existing sentence pretreatment technique.
  • the leader element x is a component of a simple sentence: when the simple sentence is a clause, the leader element is a connected pronoun of the leading clause or A connecting phrase that connects an adverb and a leading clause; when the simple sentence is a parallel sentence, the leading element is a parallel conjunction connecting the parallel sentence with other preceding parallel sentences. That is, in a simple sentence, the leader element x is a syntactic component composed of related word units for guiding subsequent simple sentences.
  • the B3 part generates three key sets: ⁇ x k ⁇ , ⁇ y k ⁇ , ⁇ z k ⁇
  • syntactic order values in the original sentence S are arranged from small to large. You may wish to set Orderly pair Formulating a continuous string of words among them Is the slave in the original sentence S To A set of adjacent consecutive word strings or empty word strings. Exhaustion of such ordered pairs and continuous word string formulas.
  • This method can also be used to generate other types of side-by-side components, such as generating side-by-side adjective phrases.
  • the entire NPI, the entire VNP, and the entire NOMP phrase in the method are replaced by the entire NPI, the entire VNP, and the entire NOMP phrase in the original sentence, the corresponding syntax component can be obtained.
  • Unary function A(S), A(S) indicates that all NPI phrases, all VNP phrases, and all NOMP phrases in the original sentence S are taken out, and all NPI phrases, all VNP phrases, and all NOMP phrases in the original sentence are also taken.
  • ⁇ 1 , ..., ⁇ m-1 , ⁇ m ⁇ , m ⁇ N, m is the number of elements in the set ⁇ .
  • the binary function K( ⁇ , ⁇ ), K( ⁇ , ⁇ ) represents the result of the unary function B( ⁇ ), that is, the given one By element
  • the syntactic order values in the original sentence S are arranged from small to large. You may wish to set Orderly pair Set collection Then establish a continuous string formula among them Is the slave in the original sentence S To a set of adjacent consecutive or empty words, and then
  • H( ⁇ t ) represents the generation of the binary function K( ⁇ , ⁇ )
  • N( ⁇ , ⁇ ), N( ⁇ , ⁇ ) represents the result of the binary function M( ⁇ , ⁇ ) That is, for any collection If collection There is a corresponding collection family Then construct a new collection as follows then
  • the unary function u( ⁇ ), u( ⁇ ) represents the result of the binary function N( ⁇ , ⁇ ) take Assume For the given element ⁇ , There are ⁇ ( ⁇ ) ⁇ ⁇ ( ⁇ ). then
  • V( ⁇ ), V( ⁇ ) represents the result of the binary function N( ⁇ , ⁇ ) take Assume For the given element ⁇ , There are ⁇ ( ⁇ ) ⁇ ⁇ ( ⁇ ). then
  • the word sequence table is:
  • NPI yk ⁇ NPI
  • VNP yk ⁇ VNP
  • NOMP k ⁇ NOMP
  • statement S can be expressed in a matrix form, namely:
  • f j acts as a subject element or an object element of another function f k
  • f(f) the compound operation
  • Each English sentence S that does not omit the predicate verb can be regarded as a result of a finite number of compounding and partial addition operations by n functions f 1 , ..., f n (n is equal to the number of predicate verb units). According to this, any English sentence S that does not omit the predicate can be recorded as:
  • any English sentence that does not omit the predicate is obtained by a composite or partial addition operation of a vector including a guide element, a subject element, a predicate element, or an object element.
  • Each linear expression of the English natural sentence S that does not omit the predicate contains a finite number of partial addition operations and compound operations.
  • This paper uses a linear expression as a supplementary expression of the natural sentence S.
  • syntactic structure may be excluded from the solution; for example, for the possible matrix solutions below
  • the word unit numbered 4 does not appear and is excluded.
  • the word unit numbered 5 appears twice and is excluded.
  • the syntactic vectors that find the clear position are all equally substituted. If the cross-contradictions of the two syntactic vectors appear after the equal-substitution, the possible matrix solution may be excluded.
  • any possible matrix solution if there is a syntax vector that does not have a substitution relationship with other syntax vectors, an insertion operation is performed to obtain a possible syntactic parsing structure corresponding to all the possible matrix solutions, and the parsing according to the possible syntax is verified. Whether the statement obtained by the structure is identical to the preprocessed statement, further comprising:
  • 5.5.2 take a second type of syntax vector Mark one by one according to the predetermined direction The order value of each syntax element in the message; after appending the order value of the syntax element, take any The i-th syntax element in the construct, only a unique gap is constructed on the first side of the syntax element; after the void, take a syntax vector Second type of syntax vector Syntactic vector in the form of overall insertion Insert the constructed vacancy, and then generate a new syntax vector, record this new syntax vector as The syntactic vectors obtained by inserting the whole into space are collectively referred to as the third type of syntax vector;
  • the third type of syntax vector Pair vector from the predetermined direction
  • the first syntactic element on the first side starts into the vector Vector contained in Each of the syntax elements up to the first syntactic element on the second side, all of which are labeled with a sequence value; Vector contained in The element on the first side, without the order value; the vector The first syntax element on the second side is marked as Will be vectored as described above
  • the syntactic vector part of the annotation denoted as the iris syntax vector
  • Each syntax element in the label is labeled with a sequential value; after the order value of the syntax element is annotated, take one The t-th syntax element in the construct, constructing a unique vacancy on one side of the syntactic element; after creating an empty space, taking an unused second-class syntactic vector The vector is inserted as a whole Insert the previously constructed gap and generate a new vector, then the new vector is recorded as
  • the noun pronoun unit acts as a re-examination and trade-off between the subject and the object.
  • the correction can overcome the problem of irregular structure of some statements and improve the accuracy of analysis.
  • the syntax structure can be formed into a syntax tree data structure according to the analysis result.
  • the order value sequence of the original sentence, 1, 2, ..., k can be regarded as an equivalent substitution of the syntactic vector in which the explicit position is found in the possible matrix solution and the unidentified position in the possible matrix solution
  • a finite number of global interpolations between syntax vectors That is, the initial syntax vector corresponding to the original sentence It can be seen as an equal-substitution of the syntactic vector in which the explicit position is found in the possible matrix solution, and then through the finite sub-interpolation between the syntactic vectors in the possible matrix solution where the clear position cannot be found. .
  • a variety of different insertions are essentially the permutations and combinations in combinatorial mathematics.
  • any possible matrix solution if there is a syntactic vector with no explicit substitution relationship with any other syntax vector, firstly, the syntactic vectors with substitutional relations with other syntax vectors in the possible matrix solution are all equal.
  • the quantity substitution is performed, and at the same time, the syntactic vectors in the possible matrix solution and other syntactic vectors do not have an substitution relationship, and the above two aspects are integrated, and the possible matrix solutions are transformed into a group which does not exist with each other.
  • Syntactic vector of substitution relationship The original syntactic vectors f 1 , f 2 , . . .
  • the first type of syntax vector after the aforementioned equal-substitution, the groups transformed in the foregoing manner are mutually There is no syntactic vector for the substitution relationship They are collectively referred to as the second type of syntactic vectors; it is emphasized that the second type of syntactic vectors are all syntactic vectors that do not have substitutional relationships with each other.
  • ⁇ ⁇ 2 the overall insertion is meaningful; the following discussion all preset ⁇ ⁇ 2.
  • any second type of syntax vector Label the syntax vectors one by one from right to left (and from left to right) The order value of each syntax element in .
  • the third type of syntactic vector, the newly obtained third type of syntactic vector is recorded as [ ⁇ ] i + ⁇ ; emphasize that the third type of syntactic vectors are all syntactic vectors that do not have substitution relations with each other. The first emptying and emptying steps are completed.
  • the third type of syntactic vector obtained after the first emptying and emptying steps In the right-to-left direction (also from left to right, but in the same direction as the last order, ie on the same side as the previous order), the slave vector
  • the first syntactic element in the right side of the number begins in the vector Vector contained in
  • Each of the syntax elements up to the first syntax element on the left side is all labeled with a sequence value;
  • Vector contained in The syntax element on the left, without the order value; the vector The first syntax element on the left is recorded as Will be vectored as described above
  • the syntactic vector is called: the tail vector.
  • Second type of syntax vector Syntactic vector in the form of overall insertion Insert the previously constructed gaps to generate a new syntax vector, then record the newly generated syntax vector as For any given [alpha] vector syntax and ⁇ , the number of beta] vector left in the first syntax element referred to as ⁇ ( ⁇ ), if present, the vector syntax [ ⁇ ] i + ⁇ , the vector according to the embodiment of [ ⁇ ] i + ⁇ is annotated, and the part of the syntax vector marked as described above is denoted as: vector [ ⁇ k ⁇ ( ⁇ k-1 )], which is called: the tail vector.
  • the second emptying and emptying steps are completed.
  • the tail vector is selected for the third type of syntactic vector obtained through the previous emptying and emptying steps, and the selected tail vector is labeled with the order value according to the foregoing method, but with the last labeling
  • the selection direction is the same, that is, on the same side as the previous sequence; after the order value is assigned, take a syntax element in the tail vector, and construct a unique one-side vacancy according to the foregoing method, but with the last emptying
  • the selection direction is the same, that is, on the same side as the previous emptying; after the emptying, take a second type of syntax vector other than the syntactic vector used in the previous emptying and insertion steps, to insert the entire empty space.
  • the second type of syntax vector is inserted into the previously constructed gap, and a new syntax vector is generated; the foregoing operation is repeated: whenever the last emptying and emptying steps are completed, the method is followed according to the foregoing method.
  • the third type of syntax vector obtained from the last emptying and insertion steps performs the next emptying and insertion operations until the second type of syntax vector After all the insertions are completed, the third type of syntactic vector of a single line is finally obtained; the third type of syntactic vector obtained last is called the final single line vector.
  • the first choice of the second type of syntax vector from the first A complete process to generate the final single-line vector is taken as a specific solution, so that each of the aforementioned emptying and emptying steps is also a step in the specific solution.
  • the final single-row vector that does not appear in the order of the two positions is in line with the natural law, and is a reasonable final single-line vector; retain the reasonable final single-line vector as one of the correct results, and retain the syntactic structure.
  • any possible matrix solution if there is a syntactic vector with no explicit substitution relationship with any other syntax vector, firstly, the syntactic vectors with substitutional relations with other syntax vectors in the possible matrix solution are all equal. Substituting the quantity, and at the same time keeping the syntactic vector of the possible matrix solution and other syntactic vectors without the substitution relationship remain unchanged. Combining the above two aspects, the possible matrix solution is transformed into a group that does not exist with each other. Syntactic vector of substitution relationship The syntactic vectors f 1 , f 2 , . . .
  • the first type of syntactic vectors which are the original ones of the possible matrix solutions are collectively referred to as the first type of syntactic vectors; after the aforementioned equivalent substitution, a set of mutual transformations will be performed in the foregoing manner.
  • There is no syntactic vector between substitutions It is collectively referred to as the second type of syntactic vector; it is emphasized that the second type of syntactic vectors are all syntactic vectors that do not have substitutional relationships with each other.
  • ⁇ ⁇ 2 the overall insertion is meaningful; the following discussion all preset ⁇ ⁇ 2.
  • the single-side undirected unordered overall insertion is also referred to as a one-sided forward unpreserved overall insertion: any second type of syntax vector Syntactic vector in right-to-left direction (also from left to right) Each syntax element in the dimension is labeled one by one.
  • the third type of syntactic vector obtained after the first emptying and emptying steps According to the direction from right to left (can also be from left to right, but the same direction as the previous order, that is, on the same side as the previous order), the syntactic vector
  • Each syntax element in the label is labeled with a sequential value; after the order value of the syntax element is annotated, take one In the syntax element, you may wish to set the element to be The t-th syntax element on the right side of the text, only on the right side of the syntax element (can also be on the left side only, but the same direction as the previous emptying, ie on the same side as the last empty), the unique construction Vacancies; after the void, take a syntax vector other than the one used in the first emptying and inserting steps with Second type of syntax vector The vector is inserted as a whole Insert the previously constructed gap and generate a new vector, then the new vector is recorded as The second emptying and emptying steps are completed.
  • the third type of syntactic vector obtained through the previous emptying and emptying steps is labeled with a sequential value, but the same as the previous sampling order, that is, on the same side as the previous standard;
  • the sequence value take a syntactic element in the third type of syntax vector, construct a unique one-sided vacancy according to the above method, but the same direction as the previous emptying, that is, on the same side as the last emptying
  • the emptying take a second type of syntactic vector other than the syntactic vector used in the previous emptying and insertion steps, and insert the second type of syntactic vector into the previously constructed vacancy in a global emptying manner.
  • the class syntax vector performs the next emptying and inserting operations until the second type of syntax vector After all the insertions are completed, the third type of syntactic vector of a single line is finally obtained; the third type of syntactic vector obtained last is called the final single line vector.
  • the first choice of the second type of syntax vector from the first A complete process to generate the final single-line vector is taken as a specific solution, so that each of the aforementioned emptying and emptying steps is also a step in the specific solution.
  • the final single-line vector with no order reversal of the two positions is in accordance with the natural law, and is a reasonable final single-line vector; retaining a reasonable final single-line vector as one of the correct results And retain the syntactic structure possible matrix solution as one of the correct results, in order to generate a syntax tree.
  • the order value sequence of the original sentence, 1, 2, ..., k can be regarded as the equivalent substitution of the syntactic vector in the possible matrix solution to find the clear position and the search in the possible matrix solution. Obtained by a finite number of global insertions between syntactic vectors at explicit locations. That is, the initial syntax vector corresponding to the original sentence It can be seen as an equal-substitution of the syntactic vector in which the explicit position is found in the possible matrix solution, and then through the finite sub-interpolation between the syntactic vectors in the possible matrix solution where the clear position cannot be found. .
  • a variety of different insertions are essentially the permutations and combinations in combinatorial mathematics.
  • the first method mentioned above has restrictions on the selection of the empty elements, and actually requires the order of inserting the syntax vectors; the second method described above is for emptying. There is no limit to the choice of elements. In fact, it is not required to maintain the order of inserting syntax vectors.
  • the final single-line vector generated by the first method described above is a syntactic vector different from each other, and the second method described above
  • the resulting final single-line vector may be similar, so remove the extra identical final single-line vector.
  • the parallel noun pronoun combination vector and the associated word combination vector are regarded as a whole and cannot be inserted into the whole of other syntax vectors.
  • the order value sequence of the original sentence, 1, 2, ..., k can be regarded as the equivalent substitution of the syntactic vector in the possible matrix solution to find the clear position and the search in the possible matrix solution. Obtained by a finite number of global insertions between syntactic vectors at explicit locations. That is, the initial syntax vector corresponding to the original sentence It can be seen as an equal-substitution of the syntactic vector in which the explicit position is found in the possible matrix solution, and then through the finite sub-interpolation between the syntactic vectors in the possible matrix solution where the clear position cannot be found. .
  • a variety of different insertions are essentially the permutations and combinations in combinatorial mathematics.
  • each ⁇ element map ⁇ j be from the set ⁇ t 1 , t 2 ,...,t ⁇ ⁇ to the set
  • the mapping ⁇ t 1 , t 2 , ..., t ⁇ ⁇ is only used to represent the domain of the mapping ⁇ j, and thus the ⁇ element of the calibration set ⁇ is fully arranged, and has no other practical meaning.
  • the construction map is as follows: j ⁇ N, 1 ⁇ j ⁇ ⁇ ! For any given j 1 and j 2 , j 1 ⁇ N, j 2 ⁇ N, 1 ⁇ j 1 ⁇ ⁇ ! , 1 ⁇ j 2 ⁇ ⁇ ! If j 1 ⁇ j 2 , then ⁇ j 1 ⁇ j 2 . ( ⁇ 2)
  • the second type of syntax vector will be selected in one of the foregoing Arranges any of the ⁇ second-class syntax vectors used in the complete operational flow of generating the final single-line vector as a scheme pattern.
  • a scheme mode it means that the first step is to select the vector. with And vector Insert vector in a single-sided ordering overall insertion
  • the second step is to select And Insert the new syntax vector generated in the first step by inserting the unilaterally ordered order into the whole space, ...
  • the ⁇ th step is to select the vector
  • the new syntax vector generated by the ( ⁇ -1)th step is inserted in a single-sided order-slot overall insertion.
  • any ⁇ element mapping ⁇ j is a scheme mode, and all ⁇ element mapping ⁇ j sets Is the collection of all program modes, then the total number of all program modes is ⁇ ! One. ( ⁇ 2)
  • Definition 1.2 The operation of inserting an empty vector and generating a new vector according to any one of the scheme modes as a step of the scheme mode.
  • the kth step performed according to ⁇ j is recorded as [n k ] indicates that there are a total of n k kinds of choices for the kth step.
  • the emptying recursive algorithm to be constructed below It is the kth step performed according to the scheme mode ⁇ j described above. Where k is the interpolation null recursive algorithm The number of runs, that is, the number of times the aforementioned one-side ordering overall insertion operation is performed.
  • Definition 1.4 Give a syntax vector ⁇ , and the unary function W indicates that the syntax vector ⁇ is taken out and marked.
  • W ( ⁇ ) ⁇ k represents a vector syntax removed ⁇ , and the vector [alpha] Syntax labeled ⁇ k, ⁇ k is called: input vector.
  • Definition 1.5 Give a syntax vector ⁇ , and the unary function Q indicates that the syntax vector ⁇ is taken out and marked.
  • Run the recursive algorithm In the process, the syntax vector ⁇ k is inserted into the syntax vector ⁇ k .
  • Z represents a binary function of the syntax of the vector sequence label value ⁇ k, ⁇ k vector syntactic counted from the right in a first syntax element denoted by ordinal value 1, and then from right to left are denoted sequential values 2,3 , ..., until the first syntax element from the left in the vector ⁇ k-1 contained in the vector ⁇ k is attached.
  • the first syntax element from the left in the vector ⁇ k-1 is denoted by ⁇ ( ⁇ k-1 ), and the order value of ⁇ ( ⁇ k-1 ) is denoted as n k , then n k is the aforementioned annotation The maximum order value.
  • the element ⁇ ( ⁇ k-1 ) represents the vector ⁇ k-1
  • the binary function Z k ( ⁇ , ⁇ ) is run for ⁇ k to be: ⁇ ( ⁇ k-1 ) ⁇ n k >...b 2 ⁇ 2>b 1 ⁇ 1> , this result is recorded as: vector [ ⁇ k ⁇ ( ⁇ k-1 )], which is called: the tail vector.
  • the binary function T represents the overall insertion operation of the tail vector.
  • the i th kth syntax element from the right side is selected on the tail vector, and at the i kth The right side of the syntax element constructs a unique gap, and then the vector ⁇ k is inserted into the gap as a whole.
  • the binary function T k ( ⁇ , ⁇ ) is run on the vector [ ⁇ k ⁇ ( ⁇ k-1 )] and the vector ⁇ k , and ⁇ k is inserted into the whole space [ ⁇ k ⁇ ( ⁇ k-1 )]
  • the vacancy corresponding to the ith kth syntax element of the right side is obtained: Record the new vector obtained by the aforementioned insertion into Vector Is: output vector.
  • ⁇ and ⁇ are the meanings of independent variables, which are abstract marks and can be widely used. Therefore, the above notation is not contradictory.
  • the characteristic is that the above-mentioned single-side ordering overall insertion operation is decomposed into four links: 1 to take a null vector; 2 to take an empty vector; 3 to mark a specific syntax element in the empty vector, and to intercept the sequence value, and intercept The tail vector; 4 randomly selects a syntactic element in the tail vector, and constructs a unique gap on the right side, and then inserts the inserted vector into the previously constructed gap in a global insertion manner.
  • n ⁇ be the number of syntactic elements in the tail vector selected by the ⁇ th step
  • i ⁇ is the number of element sequences corresponding to the vacancies constructed by the ⁇ th step.
  • ⁇ 3 , 4 , 2 > operates as follows: (n 2 ⁇ N, i 2 ⁇ N, 1 ⁇ i 2 ⁇ n 2 )
  • ⁇ 3 , 4 , 2 > operates as follows: (n 3 ⁇ N, i 3 ⁇ N, 1 ⁇ i 3 ⁇ n 3 )
  • the parallel noun pronoun combination vector and the associated word combination vector are regarded as a whole and cannot be inserted into the whole of other syntax vectors.
  • Definition 1.9 Run the insertion recursion calculation
  • the number of the tail vectors intercepted in the kth step is n, and it is possible to set the n tail vectors intercepted in the kth step to be ⁇ 1 , ⁇ 2 , . . . , ⁇ n Then, the entire insertion of each of these vectors is performed in the kth step, and the sum of the number of insertions of these vectors is denoted as ⁇ [ ⁇ (k)].
  • the tail vector of the kth step Syntactic element Represents the element contained in the vector ⁇ k , ⁇ k-1 > represents the element of the vector ⁇ k-1 in the vector ⁇ k Corresponding vacancies.
  • the dovetail vector [ ⁇ k ⁇ ( ⁇ k-1 )] the number of elements in the syntax vector is obviously: ⁇ [ ⁇ k ]+(i k -1). The conclusion to be proved is established.
  • Lemma 1.2 Given a k, k ⁇ N and k ⁇ 1, the maximum order value of the syntax elements of the end vector vector intercepted on the kth step is:
  • the number of syntactic elements contained in the tail vector [ ⁇ k ⁇ ( ⁇ k-1 )] of the kth step is:
  • the maximum order value of the syntax elements of the end vector vector intercepted on the kth step is:
  • Lemma 1.3 Insertion recursive algorithm Generated specific plan The number of ⁇ i 1 ,i 2 ,...,i k > is: (definition of ⁇ , see definition 2.5) (k ⁇ N,k ⁇ 1)
  • T k ( ⁇ , ⁇ ) recursive definition i is in the range 1: 1 ⁇ i 1 ⁇ n 1, i.e. 1 ⁇ i 1 ⁇ [ ⁇ 1].
  • the total number of insertions in the first step is ⁇ [ ⁇ 1 ], that is, ⁇ 1 has ⁇ [ ⁇ 1 ] insertions of ⁇ 1 .
  • the induction hypothesis has provided an expression of the formula ⁇ [ ⁇ k-1 ]+(i k-1 -1), that is, ⁇ [ ⁇ k-1 ]+(i k-1 - can be determined by the inductive hypothesis 1) All values. So, according to the way of summation, the count is accumulated, from the formula Starting, based on the assumption of induction hypothesis Traversing i k-1, whil, i full section 2, i 1 values, thereby erasing the parameter i k-1, Hence, i 2, i 1, directly calculate: a first The total number of insertions in (k+1) steps is:
  • Insertion recursive algorithm Definition the total number of insertions in the kth step ⁇ [ ⁇ (k)], which is the algorithm The total number of insertions in the last step ⁇ [ ⁇ (k)], and the algorithm Generated specific plan The number of the numbers is equal, combining the above results, the interpolation recursive algorithm Generated specific plan The number is: (the definition of ⁇ , see definition 2.5) (k ⁇ N, k ⁇ 1)
  • Newly generated syntax vectors that cannot find an explicit position They are collectively referred to as the second type of syntax vector.
  • Any syntactic vector that is eliminated in the aforementioned equal-substitution process is called a predecessor syntax vector.
  • the number of the predecessor syntax vector f replaced is denoted as u ⁇ . then It is obtained by sub-equivalent substitution of u ⁇ .
  • Theorem 1.3 Give a second type of syntax vector will The number of syntax elements included in the record is Syntactic vector The number of the predecessor syntax vector f that is eliminated is denoted as u ⁇ , then Meet the recurrence formula:
  • the above formula can also be converted into a form represented by the number of predecessor syntax vectors.
  • the emptying recursive algorithm to be constructed below It is the kth step performed according to the scheme mode ⁇ j. Where k is the interpolation null recursive algorithm The number of runs, that is, the number of times the aforementioned one-sided unscheduled overall insertion operation is performed.
  • Definition 2.1 Give a syntax vector ⁇ , and the unary function W indicates that the syntax vector ⁇ is taken out and marked.
  • Definition 2.2 Give a syntax vector ⁇ , and the unary function Q denotes the extraction and marking of the syntax vector ⁇ .
  • Q( ⁇ ) ⁇ k denotes taking out the syntax vector ⁇ and marking the syntax vector ⁇ as ⁇ k .
  • Run the recursive algorithm In the process, the syntax vector ⁇ k is inserted into the syntax vector ⁇ k .
  • Z represents a univariate function of the syntax of sequence annotation vector ⁇ k value, the syntax from the left vector ⁇ k in a syntax element from a first value of an order denoted, from left to right and then successively label value 2,3, ... until all the syntax elements in the syntax vector ⁇ k are marked. Record the maximum order value of the label as n k . Run the unary function Z to get:
  • the binary function T indicates that after applying the unary function Z to the syntactic vector ⁇ k , the m kth element of the left is selected on the vector ⁇ k and a unique gap is constructed on the right side of the m kth element. Then, the syntax vector ⁇ k is inserted into the slot in a globally inserted manner. Write the new vector obtained after inserting the empty space as: Run the binary function T to get:
  • ⁇ and ⁇ are the meanings of independent variables, which are abstract marks and can be widely used. Therefore, the above notation is not contradictory.
  • Lemma 2.2 Let: (k ⁇ N, k ⁇ 1)
  • the syntax vector ⁇ k represents a syntactic vector obtained by ⁇ j(t 1 ), ⁇ j(t 2 ), ..., ⁇ j(t k ) sequentially passing through a single-sided unpreserved global interpolation.
  • m 1 , m 2 , ..., m k-1 respectively represent the number of any gap order of the corresponding vector.
  • Theorem 2.1 The number of specific schemes generated by any one of the scheme modes ⁇ j in this method is recorded as ⁇ [ ⁇ j], then:
  • Theorem 2.2 The number of final single-row vectors generated by any one of the scheme modes ⁇ j in this method is recorded as ⁇ [ ⁇ j], then:
  • the number of final single-row vectors is equal to the number of specific schemes; and according to Theorem 2.1, it is available.
  • One or more identical final single-row vectors are reserved for one, and the remaining identical single-row vectors are deleted, and finally 210 consecutive single-row vectors that are different from each other are completely consistent with the results of Method 1.
  • the example is complete.
  • the present invention is based on syntactic analysis of the pre-processed statements represented by the above data structures to obtain the component relationships of the various word units in the sentences.
  • FIG. 1 is a flow chart of a method for parsing a computer-based natural language syntax structure according to an embodiment of the present invention. As shown in FIG. 1, the method includes:
  • Step 110 Read a pre-processed statement data structure to be parsed, where the pre-processed statement data structure includes only a related word unit, a predicate verb unit, and a noun pronoun unit of the sentence, and each word unit is in accordance with the The order in the preprocessed statement is numbered and labeled.
  • Step 120 Generate, for each predicate verb unit, a corresponding guide element, a subject element, a predicate element, and an object element; the possible value of the guide element is a parallel related word unit or subordinate whose number is smaller than the corresponding predicate verb unit number.
  • One of the related word units, or a parallel related word unit whose number is smaller than the corresponding predicate verb unit number and a related word combination composed of a dependent related word unit whose number is smaller than the corresponding predicate verb unit number and whose number is greater than the parallel related word unit number One of the vectors, or an empty unit;
  • the possible value of the subject element is one of the noun pronoun units whose number is smaller than the corresponding predicate verb unit number, or the number of the largest word unit is smaller than the juxtaposition included in the total parallel noun pronoun combination vector family of the corresponding predicate verb unit number.
  • the predicate element is a corresponding predicate verb unit
  • the possible value of the object element is one of the noun pronoun units whose number is greater than the corresponding predicate verb unit number and less than the adjacent predicate verb unit number, or the number of the smallest word unit is greater than the corresponding predicate verb unit number.
  • the corresponding guide element, subject element, and object element are generated based on the position number of each predicate element.
  • the set of related word units corresponding to each predicate verb unit r k is:
  • the leader element corresponding to the verb unit r k is x k , and its possible value set is ⁇ x k ⁇ .
  • Generating a possible set of values (preferred) in which the leader element corresponding to the predicate verb unit r k is x k includes:
  • the possible value of the guide element is one of a parallel related word unit or a dependent related word unit whose number is smaller than the corresponding predicate verb unit number, or a parallel related word unit whose number is smaller than the corresponding predicate verb unit number and one adjacent thereto
  • One of the associated word combination vectors consisting of the dependent term unit numbers whose number is smaller than the corresponding predicate verb unit number and whose number is greater than the parallel related word unit number, or an empty unit.
  • the subject element corresponding to the verb unit r k is y k , and its possible value set is ⁇ y k ⁇ .
  • Generating the corresponding subject element y k preferably includes:
  • the possible value of the subject element is one of the noun pronoun units whose number is smaller than the corresponding predicate verb unit number, or the number of its largest word unit
  • the set NPI yk represents a pure noun unit set whose number is smaller than the corresponding predicate verb unit number
  • VNP yk represents a verb unit set whose number is smaller than the noun nature of the corresponding predicate verb unit number
  • the NOMP k number is smaller than the corresponding predicate
  • G k represents a union of the total number of parallel unit noun pronoun combination vector numbers whose number of the largest word unit is smaller than the corresponding predicate verb unit number
  • e represents an empty unit.
  • the possible value of the subject element is one of the noun pronoun units whose number is smaller than the corresponding predicate verb unit number, or the number of the largest word unit is smaller than
  • the corresponding predicate verb unit number is one of the parallel noun pronoun combination vectors included in the common noun pronoun combination vector family, or one of the syntactic vectors corresponding to the predicate verb unit, or an empty unit.
  • the set NPI yk represents a pure noun unit set whose number is smaller than the corresponding predicate verb unit number;
  • VNP yk represents a verb unit set whose number is smaller than the noun nature of the corresponding predicate verb unit number;
  • NOMP k number is smaller than the corresponding predicate a set of primary lattice pronoun units of the verb unit number;
  • G k represents a union of the total number of parallel unit noun pronouns combined by the number of largest word units;
  • fy k represents the predicate verb unit corresponding to the preceding
  • e represents an empty cell.
  • the leader element corresponding to the predicate verb unit r k is z k , and its possible value set is ⁇ z k ⁇ .
  • Generating the corresponding object element ⁇ z k ⁇ preferably includes:
  • the possible value of the object element is one of the noun pronoun units whose number is greater than the corresponding predicate verb unit number, or the number of its smallest word unit
  • the set NPI zk represents a set of pure noun units whose number is greater than the corresponding predicate verb unit number
  • VNP zk represents a set of verb units whose number is greater than the noun nature of the corresponding predicate verb unit number
  • OBJP k indicates that the number is greater than the corresponding a set of binge pronoun units of the noun nature of the predicate verb unit number
  • H k represents a union of the total number of parallel lexical pronoun combination vectors of the smallest word unit number greater than the corresponding predicate verb unit number
  • e represents an empty unit.
  • the possible value of the object element is greater than the corresponding predicate verb unit number and less than the adjacent predicate verb unit number.
  • One of the noun pronoun units, or the number of its smallest word unit is greater than the corresponding predicate verb unit number and less than the adjacent verb noun pronoun combination vector family of the collateral noun pronoun combination vector number of the adjacent predicate verb unit number One, or one of the syntactic vectors corresponding to the predicate verb unit that appears later, or an empty cell.
  • the set NPI zk represents a pure noun unit set whose number is greater than the corresponding predicate verb unit number and less than the adjacent predicate verb unit number;
  • VNP zk indicates that the number is greater than the corresponding predicate verb unit number and a set of verb units smaller than the adjacent nouns of the predicate verb unit number;
  • OBJP k represents a set of binge pronoun units larger than the corresponding predicate verb unit number and smaller than the adjacent predicate verb unit number;
  • H k represents a union of the total number of parallel word nouns combined with the corresponding predicate verb unit number and less than the adjacent predicate verb unit number;
  • the fz k represents the predicate that appears later
  • the set of syntax vectors corresponding to the verb unit; e represents the empty unit.
  • step 120 through the processing in step 120, for the above example, a set of values for each element can be generated.
  • Step 130 Obtain all possible values of a syntax vector corresponding to each predicate verb unit according to possible values of the guide element, the subject element, the predicate element, and the object element, where the syntax vector includes a guide element and a subject element. , predicate elements, object elements.
  • each subject-predicate collocation structure can be represented by a syntactic vector.
  • S “I can understand what what you said meant” shown in Table 1 above, there are:
  • f 3 (what B,f 1 ,r 3 ,e) (3-1)
  • f 3 (what A,I,r 3 ,e) (3-9)
  • f 3 (what B,f 2 ,r 3 ,e) (3-2)
  • f 3 (what A,you,r 3 ,e) (3-10)
  • f 3 (what B,e,r 3 ,e)
  • Step 140 Generate at least one syntax structure possible matrix solution according to all possible values of all syntax vectors, where the syntax structure may be composed of syntax vectors arranged in order of predicate verb unit numbers.
  • Step 150 Verify whether the statement obtained by the possible matrix solution according to the syntax structure is exactly the same as the preprocessed statement. If they are identical, the syntactic vector may be outputted in the possible matrix solution and used as a syntactic structure analysis result. one.
  • the word unit number is used instead of the word unit for equal-substitution, overall insertion, and partial addition operations, and then it is determined whether the sequence of sentences is in the same order as the pre-processed statement based on whether the obtained sequence of sentences is a sequentially increasing number sequence.
  • Step 150 can include the following steps:
  • Step 151 If there is a sequence value that does not appear in the possible matrix solution of the syntax structure, the possible matrix solution may be excluded from the syntax structure; for example, for the following possible matrix solution:
  • the word unit numbered 4 does not appear and is excluded.
  • Step 152 If the same order value appears in different syntax vectors or the same syntax vector appears, the possible syntax solution of the syntax structure is excluded; for example, for the following possible matrix solutions:
  • the word unit numbered 5 appears twice and is excluded.
  • Step 153 In each possible matrix solution, the syntactic vectors having mutual substitution relations with other syntactic vectors are all equally substituted, and if cross-contradictions of two syntactic vectors appear after equal-substitution, the exclusion is excluded.
  • the syntax structure may be a matrix solution;
  • Step 154 In each possible matrix solution, the syntactic vectors having mutual substitution relations with other syntax vectors are all replaced by equal amounts. If two position reversal order values appear after the equal amount substitution, the exclusion is performed.
  • the syntax structure may be a matrix solution
  • Step 155 In any one of the possible matrix solutions, if there is a syntax vector that does not have a mutual substitution relationship with other syntax vectors, perform an insertion operation to obtain a possible syntax parsing structure corresponding to all the possible matrix solutions, and verify the basis Whether the statement obtained by the possible syntax parsing structure is identical to the pre-processed statement, further comprising:
  • step 155.1 the syntactic vector having the substitution relationship between the possible matrix solutions is firstly substituted, thereby transforming the possible matrix solution into a set of syntactic vectors which do not have an substitution relationship with each other.
  • the syntactic vector in the possible matrix solution is called the first kind of syntactic vector, and the transformed syntactic vector will be transformed.
  • Step 155.2 take a second type of syntax vector Mark one by one according to the predetermined direction The order value of each syntax element in the message; after appending the order value of the syntax element, take any The i-th syntax element in the construct, only a unique gap is constructed on the first side of the syntax element; after the void, take a syntax vector Second type of syntax vector Syntactic vector in the form of overall insertion Insert the constructed vacancy, and then generate a new syntax vector, record this new syntax vector as The syntactic vectors obtained by inserting the whole into space are collectively referred to as the third type of syntax vector;
  • Step 155.3 for the third type of syntax vector Pair vector from the predetermined direction
  • the first syntactic element on the first side starts into the vector Vector contained in Each of the syntax elements up to the first syntactic element on the second side, all of which are labeled with a sequence value; Vector contained in The element on the first side, without the order value; the vector The first syntax element on the second side is marked as Will be vectored as described above
  • the syntactic vector part of the annotation denoted as the iris syntax vector
  • After the order value is marked take the jth syntax element in the aforementioned tail vector, and construct a unique gap only on the first side of the element; after the empty, take an unused second type of syntax vector Syntactic vector in the form of overall insertion Insert the constructed vacancy, and then generate a new syntax vector, then record the newly generated syntax vector as or
  • Each syntax element in the label is labeled with a sequential value; after the order value of the syntax element is annotated, take one The tth syntax element in the construct, constructing a unique gap on the first side of the syntax element; after the void, taking an unused second type of syntax vector The vector is inserted as a whole Insert the previously constructed gap and generate a new vector, then the new vector is recorded as
  • step 155.3 is repeatedly executed.
  • the third type of syntactic vector obtained through the previous emptying and emptying steps is subjected to the next emptying and emptying operation.
  • a third type of syntax vector of a single line is finally obtained, and the finally obtained third type of syntax vector is called a final single line vector;
  • Step 155.5 if there are two position reversal order values in all the final single row vectors corresponding to a possible syntax parsing structure, the possible syntactic parsing structure is excluded;
  • step 155.6 steps 155.2 through 155.5 are repeated until all possible syntactic parsing structures are traversed.
  • syntax structure matrix solution can be expressed as:
  • the method may further include a displaying step of displaying each syntax vector in the syntax structure analysis result and the corresponding syntax structure relationship in a human-computer interaction interface by using a tree structure.
  • Example 2 As another example, the following describes the parsing process of the method of the present embodiment for a complicated structure such as "That men who were appointed didn't bother the liberals wash't remarked upon by the press.”
  • the matrix solution may be solved, and the running matrix is substituted into the solver and the structural correction program to obtain the possible matrix solution as the final result of the syntactic structure analysis:
  • This example sentence is a typical example sentence successfully processed by the overall interpolation method in this paper.
  • one of the overall null insertion results of the above possible matrix solution is a final single-row vector as follows: e+ ⁇ (1+ ⁇ 2+ ⁇ (3+ ⁇ e+ ⁇ 4+ ⁇ e)+ ⁇ 5+ ⁇ 6)+ ⁇ 7+ ⁇ e.
  • This final single-line vector does not have a reverse order number and is a reasonable final single-line vector.
  • This final single-line vector is identical to the original sentence.
  • This possible matrix solution is also the correct syntactic structure analysis result of this example sentence.
  • 1 is the main sentence, which is the core sentence; 3 is the object of 1 , that is, the object clause; 2 is the attributive clause, modifying the men; That is the qualifier, modifying the men.
  • Example 3 As another example, the parsing process of the method of the present embodiment for a complicated structure such as "Jack who has a beautiful car is a businessman.” is explained below. The above statement is preprocessed to remove impurities and the numbered word sequence is:
  • f 2 (who,a car,r 2 ,a businessman) (2-11)
  • f 2 (who,a car,r 2 ,e) (2-4)
  • f 2 (e,a car,r 2 ,a businessman) (2-12)
  • f 2 (e,a car,r 2 ,e) (2-5)
  • f 2 (who,f 1 ,r 2 ,a businessman) (2-13)
  • f 2 (who,f 1 ,r 2 ,e) (2-6)
  • f 2 (e,f 1 ,r 2 ,a businessman) (2-14)
  • f 2 (e,f 1 ,r 2 ,e) (2-7)
  • f 2 (who,e,r 2 ,a businessman) (2-15)
  • f 2 (who,e,r 2 ,e) (2-8)
  • f 2 (e,e,r 2 ,a businessman) (2-16)
  • f 2 (e,e,r 2 ,a businessman)
  • f 2
  • This example sentence is a typical example sentence successfully processed by the overall interpolation method in this paper.
  • the above possible matrix solution yields the only final single-row vector without the inverse number: e+ ⁇ 1+ ⁇ (2+ ⁇ e+ ⁇ 3+ ⁇ 4)+ ⁇ 5+ ⁇ 6 .
  • This final single-line vector is a reasonable final single-line vector.
  • the syntactic sequence value of this final single-line vector is exactly the same as the original sentence.
  • This possible matrix solution is also the correct syntactic structure analysis result of this example sentence.
  • Example 4 As another example, the parsing process of the method of the present embodiment for a sentence of a parallel structure such as "After Jack, Mary and Linda left, I gave my son a new book.” will be described below.
  • the following steps include generating a parallel lexical pronoun combination vector family:
  • the two noun pronoun units are used as a parallel noun pronoun combination vector, and the parallel noun pronoun combination vector is retained;
  • S2.4 selects the largest number of word units contained in all noun pronoun combination vectors in each noun pronoun combination vector family, as the largest word unit of the noun pronoun combination vector family, for use in subsequent generation of the subject;
  • the word unit with the lowest number included in all noun pronoun combination vectors is used as the smallest unit of the noun pronoun combination vector family, and is used for subsequent generation of the object.
  • G k denotes a union of the total number of collocated noun pronouns whose maximum value is smaller than the corresponding predicate verb unit number.
  • H k represents a union of the total number of collocated noun pronouns of the corresponding predicate verb unit.
  • the parallel noun pronoun combination vector is treated as a whole; the parallel noun pronoun combination can not be inserted into other syntax vectors; when checking the order value, directly combine the parallel noun pronouns into the included syntactic order Values can be substituted.
  • Example 5 As another example, the parsing process of the method of the present embodiment for a sentence of a parallel structure such as "Linda was singing, and Mary was dancing.” will be described below.
  • Example 6 As another example, the parsing process of the method of the present embodiment for a sentence of a parallel structure such as "I know that you have a car and that he has a bike.” will be described below.
  • the set of guide elements corresponding to each predicate verb unit r k is:
  • the leader element corresponding to the verb unit r k is x k , and its possible value set is ⁇ x k ⁇ .
  • the set of possible elements of the predicate verb unit r k is x k :
  • f 2 (that A,f 1 ,r 2 ,a car) (2-17)
  • f 2 (that A,you,r 2 ,e) (2-6)
  • f 2 (e,f 1 ,r 2 ,a car) (2-18)
  • f 2 (e,you,r 2 ,e) (2-7)
  • f 2 (that A,e,r 2 ,a car) (2-19)
  • f 2 (that A,I,r 2 ,e) (2-8)
  • f 2 (e,e,r 2 ,a car) (2-20)
  • f 2 (e, I, r 2 , e) (2-9)
  • f 2 (that A,you,r 2 ,f 3 ) (2-21)
  • f 2 (that A,f 1 ,r 2 ,e) (2-10)
  • f 2 (e,you,r 2 ,f 3 ) (2-22)
  • f 2 (e,f 1 ,r 2 ,e) (2-11)
  • the associated word combination vector ⁇ cannot be inserted into other syntax vectors; when checking the order value, the two syntactic sequence values included in the associated word combination vector can be directly substituted.
  • the end result is that the second object clause is considered to be inserted into the empty space at the end of the first object clause.
  • FIG. 2 is a schematic diagram of an apparatus for analyzing a computer-based natural language syntax structure according to the present invention, the apparatus shown:
  • the reading component 21 is configured to read the pre-processed statement data structure to be parsed, and the pre-processed statement data structure includes only the parallel-associated word unit, the subordinate-related word unit, the predicate verb unit, and the noun pronoun unit of the statement. And each word unit is numbered in the order in the preprocessed statement, and the type is marked;
  • the element generating component 22 is configured to generate, for each predicate verb unit, a corresponding guiding element element, a subject element, a predicate element, and an object element;
  • the possible value of the guide element is one of a parallel related word unit or a dependent related word unit whose number is smaller than the corresponding predicate verb unit number, or a parallel related word unit whose number is smaller than the corresponding predicate verb unit number and one of them
  • the possible value of the subject element is one of the noun pronoun units whose number is smaller than the corresponding predicate verb unit number, or the number of the largest word unit is smaller than the juxtaposition included in the total parallel noun pronoun combination vector family of the corresponding predicate verb unit number.
  • the predicate element is a corresponding predicate verb unit
  • the possible value of the object element is one of the noun pronoun units whose number is greater than the corresponding predicate verb unit number and less than the adjacent predicate verb unit number, or the number of the smallest word unit is greater than the corresponding predicate verb unit number.
  • a vector generating component 23 configured to obtain, according to possible values of the leader element, the subject element, the predicate element, and the object element, all possible values of a syntax vector corresponding to each predicate verb unit, where the syntax vector includes a guide language Elements, subject elements, predicate elements, and object elements;
  • a matrix generating component 24 configured to generate at least one syntax structure possible matrix solution according to all possible values of all syntax vectors, wherein the syntax structure possible matrix solution consists of a syntax vector arranged according to a predicate verb unit number order;
  • the solving component 25 is configured to verify whether the statement obtained by the possible matrix solution according to the syntax structure is identical to the preprocessed statement. If they are identical, each syntactic vector in the possible matrix solution of the syntactic structure is used as a syntactic structure analysis result. one;
  • the solving component 25 excludes possible syntactic structure solutions by the following module operations:
  • a first exclusion module if there is a sequence value that does not appear in the possible matrix solution of the syntax structure, the possible matrix solution is excluded from the syntax structure;
  • the second exclusion module excludes the possible matrix solution if the same sequence value appears in the different syntax vectors or the same syntax vector appears;
  • the syntactic vectors having mutual substitution relations with other syntax vectors are all equally substituted, and if the cross-contradictions of the two syntax vectors appear after the equal-substitution, Excluding the syntactic structure possible matrix solution;
  • a fifth exclusion module in any one of the possible matrix solutions, if there is a syntax vector that has no substitution relationship with other syntax vectors, performing an interpolation operation to obtain a possible syntax parsing structure corresponding to all the possible matrix solutions, and Verification of whether the statement obtained according to the possible syntax parsing structure is identical to the preprocessed statement, further comprising:
  • the first sub-module first performs an equal substitution of the syntactic vectors in the possible matrix solutions with the substitution relationship between them, thereby transforming the possible matrix solutions into a set of syntactic vectors without substitution relations between them.
  • the syntactic vector in the possible matrix solution is called the first kind of syntactic vector, and the transformed syntactic vector will be transformed.
  • the second sub-module taking a second type of syntax vector Mark one by one according to the predetermined direction
  • the third type of syntax vector Pair vector from the predetermined direction
  • the first syntactic element on the first side starts into the vector Vector contained in Each of the syntax elements up to the first syntactic element on the second side, all of which are labeled with a sequence value; Vector contained in The element on the first side, not the order value; the vector The first syntax element on the second side is marked as Will be vectored as described above
  • the syntactic vector part of the annotation denoted as the iris syntax vector
  • Each syntax element in the label is labeled with a sequential value; after the order value of the syntax element is annotated, take one The tth syntax element in the construct, constructing a unique gap on the first side of the syntax element; after the void, taking an unused second type of syntax vector The vector is inserted as a whole Insert the previously constructed gap and generate a new vector, then the new vector is recorded as
  • the fourth sub-module repeats the operation of the third sub-module, and each time the last nulling and emptying step ends, the third type of syntactic vector obtained through the last emptying and emptying steps is made for the next time. Empty and insert operations until all second type of syntax vectors will be After all the insertions are completed, a third type of syntax vector of a single line is finally obtained, and the finally obtained third type of syntax vector is called a final single line vector;
  • a fifth submodule if there are two position reversal order values in all of the final single row vectors corresponding to a possible syntactic parsing structure, the possible syntactic parsing structure is excluded;
  • the sixth sub-module repeatedly calls the operations of the second sub-module to the fifth sub-module until all possible syntactic parsing structures are traversed.
  • the device may further include:
  • the result display component displays the syntax vector and the corresponding syntax structure relationship in the syntax structure analysis result on the human-computer interaction interface by using a tree structure.
  • the present invention focuses on solving the problem of accurate parsing of compound sentence structures in natural language.
  • the most important features of the present invention are: 1 fully utilizing the properties of the composite function; 2 using a matrix model and a linear model to describe the syntactic formula; 3 using the related principles of combinatorial mathematics to generate a matrix model and a linear model.
  • the accuracy of the natural language syntax structure analysis can be improved.
  • natural language has discrete characteristics, and this is the difficulty in the parsing of syntactic structures.
  • the invention effectively combines the syntactic vector with the matrix form, without destroying the integrity of the sentence structure, and does not hinder the analysis of the intrinsic components and the relationship between the words in each sentence.
  • the invention adopts a matrix model and a linear model to characterize the sentence formula, which not only conforms to the discrete characteristics of natural language, but also effectively reveals the information association on the syntactic structure.
  • the present invention adopts a matrix model and a linear model to convert a single-line natural language sentence into a hierarchical linear nested form, thereby largely avoiding the computer directly labeling the original sentence of the natural language. And partition structure The resulting confusion makes the computer's program tasks clearer and more concise.
  • the matrix model and the linear model used in the present invention are equivalent to drawing a plurality of parallel runways for natural language sentences, so that the natural language sentences start at the same time on a plurality of parallel runways, and then the correct results are screened therefrom; Provide multiple planes for natural language statements, process natural language statements on multiple planes, and then filter the correct results.
  • the present invention uses the correlation principle of combinatorial mathematics to generate all matrices, and then excludes them one by one, and finally obtains at least one possible correct syntactic structure parsing result.
  • this process only the mathematical principle and information coding are needed. Only the values of the real numbers need to be processed.
  • Each step is finally implemented to check whether the value of the syntax vector is in ascending order, that is, the size of the real number is not involved. Language information in English itself.
  • the present invention requires a large amount of mathematical operations, and therefore must be realized by the computing power of the computer.
  • the present invention is based on mathematical principles of abstract algebra, set theory, combinatorial mathematics, computability theory and computational linguistics, and corresponding computer techniques, using mathematical ideas of complex functions, by establishing matrix models and linear models, constructing The recursive function is used to analyze the natural language syntactic structure.
  • the recursive function is used to analyze the natural language syntactic structure.
  • the invention has unique concept, ingenious method and detailed argumentation, and fully utilizes the laws of mathematics and computer science, and the method has high accuracy and high technical difficulty.

Abstract

A computer-based method and device for parsing natural language syntactic structures. Natural language syntactic structures are parsed by building a matrix model and a linear model and constructing a recursive function through the mathematical thought of a composite function according to mathematical principles of subjects comprising the abstract algebra, the set theory, the combinatorial mathematics, the computability theory, the computational linguistics and the like and corresponding computer technologies; meanwhile, methods such as the mathematical induction are comprehensively applied to proving important conclusions. A set of brand new mathematical models are built for sentences of the natural language, and the thought is basically different from that of a conventional traditional method; two overall plug-in methods comprising a single-side same-direction order preserving method and a single-side same-direction non-order-preserving method are creatively provided, and a parallel syntactic constituent generating and processing method of a set family is creatively applied; the rules of the mathematics subjects and the computer subjects are sufficiently used, and the method has high accuracy and large operation amount, and has certain technological difficulty.

Description

一种基于计算机的自然语言句法结构解析的方法和装置Method and device for analyzing natural language syntax structure based on computer
本申请要求了2014年8月22日提交的、申请号为2014104196340、发明名称为“一种基于计算机的自然语言句法结构解析的方法和装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。The present application claims priority to Chinese Patent Application No. 2014104196340, filed on Aug. 22, 2014, entitled "A Method and Apparatus for Resolving Computer-Based Natural Language Syntactic Structures", the entire contents of which are hereby incorporated by reference. Combined in this application.
技术领域Technical field
本发明涉及计算机数据处理领域,具体涉及一种基于计算机的自然语言句法结构解析的方法和装置。The present invention relates to the field of computer data processing, and in particular to a computer-based natural language syntax structure parsing method and apparatus.
背景技术Background technique
自然语言处理是计算机科学领域和人工智能领域中的一个重要方向。它研究能实现人与计算机之间使用自然语言进行有效通信的各种理论和方法。Natural language processing is an important direction in the field of computer science and artificial intelligence. It studies various theories and methods that enable effective communication between humans and computers using natural language.
句法结构解析是自然语言处理的一个重要方面,其通过计算机对自然语言语句的句子成分进行自动划分以辅助对于语句的进一步处理。在现有的句法结构解析技术中,通常采用概率上下文无约束算法(Probabilistic Context Free Grammars,PCFG),其基于自然语言具有复杂嵌套性的特点,计算语句与句法结构解析结果的规则匹配概率,选取概率最大的句法解析结果作为最终的句法结构。Syntactic structure analysis is an important aspect of natural language processing. It automatically divides the sentence components of natural language sentences by computer to assist in the further processing of sentences. In the existing syntax structure analysis technology, Probabilistic Context Free Grammars (PCFG) is usually adopted, which is based on the complex nesting characteristics of natural language, and the rule matching probability of the calculation result of sentence and syntactic structure is calculated. The syntactic parsing result with the highest probability is selected as the final syntactic structure.
但是,该方法复杂度高,而且,对于复合式句子结构的解析准确性还亟待进一步提高。However, the method is highly complex, and the analytical accuracy of the compound sentence structure needs to be further improved.
发明内容Summary of the invention
有鉴于此,本发明提供了一种基于计算机的自然语言句法结构解析的方法和装置,构思独特、方法巧妙、论证详实,充分利用了数学和计算机学科的规律,所述方法准确性较高,运算量非常大,有较高的技术难度。In view of this, the present invention provides a computer-based natural language syntax structure analysis method and apparatus, which has unique ideas, ingenious methods, and detailed argumentation, and fully utilizes the laws of mathematics and computer science, and the method has high accuracy. The amount of calculation is very large and has high technical difficulty.
本发明提供一种基于计算机的自然语言句法结构解析的方法,包括:The invention provides a computer-based natural language syntax structure parsing method, comprising:
S1、读取待解析的经预处理的语句数据结构,所述经预处理的语句数据结构中仅包括语句的并列关联词单元、从属关联词单元、谓语动词单元、名词代词单元,且各词单元按照在所述经预处理的语句中的顺序进行编号,并标注类型;S1: reading a pre-processed statement data structure to be parsed, wherein the pre-processed statement data structure includes only a parallel-related word unit, a subordinate-related word unit, a predicate verb unit, a noun pronoun unit, and each word unit is The order in the preprocessed statement is numbered and labeled;
S2、对每一谓语动词单元,生成对应的引导语元素、主语元素、谓语元素和宾语元素;S2, for each predicate verb unit, generating a corresponding guide element, a subject element, a predicate element, and an object element;
所述引导语元素的可能取值为编号小于对应的谓语动词单元编号的并列关联词单元或从属关联词单元之一,或由一个编号小于对应的谓语动词单元编号的并列关联词单元和一个与其相邻且编号小于对应的谓语动词单元编号且编号大于该并列关联词单元编号的从属关联词单元构成的关联词组合向量之一,或空单元; The possible value of the guide element is one of a parallel related word unit or a dependent related word unit whose number is smaller than the corresponding predicate verb unit number, or a parallel related word unit whose number is smaller than the corresponding predicate verb unit number and one adjacent thereto One of the associated word combination vectors composed of the dependent word unit whose number is smaller than the corresponding predicate verb unit number and whose number is greater than the parallel related word unit number, or an empty unit;
所述主语元素的可能取值为编号小于对应的谓语动词单元编号的名词代词单元之一,或最大词单元的编号小于对应的谓语动词单元编号的全体并列名词代词组合向量族中所包含的并列名词代词组合向量之一,或在前出现的谓语元素对应的句法向量之一,或空单元;The possible value of the subject element is one of the noun pronoun units whose number is smaller than the corresponding predicate verb unit number, or the number of the largest word unit is smaller than the juxtaposition included in the total parallel noun pronoun combination vector family of the corresponding predicate verb unit number. One of the noun pronoun combination vectors, or one of the syntactic vectors corresponding to the predicate element, or an empty unit;
所述谓语元素为对应的所述谓语动词单元;The predicate element is a corresponding predicate verb unit;
所述宾语元素的可能取值为编号大于对应的谓语动词单元编号且小于相邻的在后出现的谓语动词单元编号的名词代词单元之一,或最小词单元的编号大于对应的谓语动词单元编号且小于相邻的在后出现的谓语动词单元编号的全体并列名词代词组合向量族中所包含的并列名词代词组合向量之一,或在后出现的谓语元素对应的句法向量之一,或空单元;The possible value of the object element is one of the noun pronoun units whose number is greater than the corresponding predicate verb unit number and less than the adjacent predicate verb unit number, or the number of the smallest word unit is greater than the corresponding predicate verb unit number. And one of the parallel noun pronoun combination vectors included in the entire parallel noun pronoun combination vector family of adjacent predicate verb unit numbers, or one of the syntactic vectors corresponding to the predicate element, or an empty unit ;
S3、根据所述引导语元素、主语元素、谓语元素和宾语元素的可能取值,获取每一谓语动词单元对应的句法向量的所有可能取值,所述句法向量包括引导语元素、主语元素、谓语元素和宾语元素;S3. Obtain all possible values of a syntax vector corresponding to each predicate verb unit according to possible values of the guide element, the subject element, the predicate element, and the object element, where the syntax vector includes a guide element, a subject element, Predicate element and object element;
S4、根据所有句法向量的所有可能取值,生成至少一个句法结构可能矩阵解,所述句法结构可能矩阵解由按照谓语动词单元编号顺序排列的句法向量组成;S4. Generate at least one syntax structure possible matrix solution according to all possible values of all syntax vectors, where the syntax structure may be composed of syntactic vectors arranged according to the order of the predicate verb unit numbers;
S5、验证根据句法结构可能矩阵解得到的语句是否与所述经预处理的语句完全相同,如果完全相同,则将该句法结构可能矩阵解中的各句法向量作为句法结构解析结果之一;S5. Verify whether the statement obtained by the possible matrix solution according to the syntax structure is identical to the preprocessed statement. If they are identical, each syntactic vector in the possible matrix solution of the syntax structure is one of the parsing result of the syntax structure;
其中,S5包括按顺序依次执行以下操作,排除不符合条件的句法结构可能解:Among them, S5 includes the following operations in order, excluding the syntactic structure that does not meet the conditions may be solved:
S5.1、如果存在不在该句法结构可能矩阵解中出现的顺序值,则排除该句法结构可能矩阵解;S5.1. If there is a sequence value that does not appear in the possible matrix solution of the syntax structure, the possible matrix solution may be excluded from the syntax structure;
S5.2、如果在不同的句法向量中出现相同的顺序值或出现相同的句法向量,则排除该句法结构可能矩阵解;S5.2. If the same order value appears in different syntax vectors or the same syntax vector appears, the possible syntax solution of the syntax structure is excluded;
S5.3、在每一个可能矩阵解中,将与其他句法向量之间存在相互代入关系的句法向量全都进行等量代换,如果在等量代换之后出现两个句法向量的交叉矛盾,则排除该句法结构可能矩阵解;S5.3. In each possible matrix solution, the syntactic vectors that are mutually substituted with other syntactic vectors are all equally substituted. If there is a contradiction between two syntactic vectors after the equal substitution, then Excluding the syntactic structure may be a matrix solution;
S5.4、在每一个可能矩阵解中,将与其他句法向量之间存在相互代入关系的句法向量全都进行等量代换,如果在等量代换之后出现两个位置逆反的顺序值,则排除该句法结构可能矩阵解;S5.4. In each possible matrix solution, all the syntactic vectors that have mutual substitution relationship with other syntactic vectors are equally substituted, if two position reversal order values appear after the equal substitution, then Excluding the syntactic structure may be a matrix solution;
S5.5、在任意一个可能矩阵解中,如果存在与其他句法向量之间没有相互代入关系的句法向量,则执行插空操作以获得所有该可能矩阵解所对应的可能句法解析结构,并验证根据所述可能句法解析结构得到的语句是否与所述经预处理的语句完全相同,其进一步包括:S5.5. In any of the possible matrix solutions, if there is a syntax vector that has no substitution relationship with other syntax vectors, perform an insertion operation to obtain a possible syntax parsing structure corresponding to all the possible matrix solutions, and verify Whether the statement obtained according to the possible syntax parsing structure is identical to the pre-processed statement, further comprising:
S5.5.1、先对该可能矩阵解中相互之间存在代入关系的句法向量进行等量代换,从而将该可能矩阵解转化为一组相互之间不存在代入关系的句法向量
Figure PCTCN2015083760-appb-000001
将可能矩阵解中的句法向量称为第一类句法向量,将转化得到的句法向量
Figure PCTCN2015083760-appb-000002
称为第二类句法向量;
S5.5.1, firstly perform equal-substitution of the syntactic vectors in the possible matrix solutions with the substitution relationship between them, thereby transforming the possible matrix solutions into a set of syntactic vectors with no substitution relationship between each other.
Figure PCTCN2015083760-appb-000001
The syntactic vector in the possible matrix solution is called the first kind of syntactic vector, and the transformed syntactic vector will be transformed.
Figure PCTCN2015083760-appb-000002
Called the second type of syntax vector;
S5.5.2、任取一个第二类句法向量
Figure PCTCN2015083760-appb-000003
按照预定的方向逐一标注
Figure PCTCN2015083760-appb-000004
中的每一个句法元素的 顺序值;标注句法元素的顺序值之后,任取
Figure PCTCN2015083760-appb-000005
中的第i个句法元素,仅在该句法元素的第一侧构造唯一的空位;造空之后,任取一个句法向量
Figure PCTCN2015083760-appb-000006
以外的第二类句法向量
Figure PCTCN2015083760-appb-000007
以整体插空的方式将句法向量
Figure PCTCN2015083760-appb-000008
插入所构造的空位,进而生成一个新的句法向量,将这个新句法向量记为
Figure PCTCN2015083760-appb-000009
并将整体插空而得到的句法向量,统称为第三类句法向量;
S5.5.2, take a second type of syntax vector
Figure PCTCN2015083760-appb-000003
Mark one by one according to the predetermined direction
Figure PCTCN2015083760-appb-000004
The order value of each syntax element in the message; after appending the order value of the syntax element, take any
Figure PCTCN2015083760-appb-000005
The i-th syntax element in the construct, only a unique gap is constructed on the first side of the syntax element; after the void, take a syntax vector
Figure PCTCN2015083760-appb-000006
Second type of syntax vector
Figure PCTCN2015083760-appb-000007
Syntactic vector in the form of overall insertion
Figure PCTCN2015083760-appb-000008
Insert the constructed vacancy, and then generate a new syntax vector, record this new syntax vector as
Figure PCTCN2015083760-appb-000009
The syntactic vectors obtained by inserting the whole into space are collectively referred to as the third type of syntax vector;
S5.5.3、对第三类句法向量
Figure PCTCN2015083760-appb-000010
按照预定的方向对从向量
Figure PCTCN2015083760-appb-000011
中的第一侧第一个句法元素开始到向量
Figure PCTCN2015083760-appb-000012
中包含的向量
Figure PCTCN2015083760-appb-000013
的第二侧第一个句法元素为止的每一个句法元素,全都标注顺序值;位于向量
Figure PCTCN2015083760-appb-000014
中包含的向量
Figure PCTCN2015083760-appb-000015
第一侧的元素,不标注顺序值;将向量
Figure PCTCN2015083760-appb-000016
的第二侧的第一个句法元素记为
Figure PCTCN2015083760-appb-000017
将按照前述方式对向量
Figure PCTCN2015083760-appb-000018
标注的句法向量部分,记为甩尾句法向量
Figure PCTCN2015083760-appb-000019
标注顺序值之后,任取一个前述的甩尾向量中的第j个句法元素,仅在该元素第一侧构造唯一的空位;造空之后,任取一个未使用过的第二类句法向量
Figure PCTCN2015083760-appb-000020
以整体插空的方式将句法向量
Figure PCTCN2015083760-appb-000021
插入所构造的空位,进而生成一个新的句法向量,则将新生成的句法向量记为
Figure PCTCN2015083760-appb-000022
或者
S5.5.3, the third type of syntax vector
Figure PCTCN2015083760-appb-000010
Pair vector from the predetermined direction
Figure PCTCN2015083760-appb-000011
The first syntactic element on the first side starts into the vector
Figure PCTCN2015083760-appb-000012
Vector contained in
Figure PCTCN2015083760-appb-000013
Each of the syntax elements up to the first syntactic element on the second side, all of which are labeled with a sequence value;
Figure PCTCN2015083760-appb-000014
Vector contained in
Figure PCTCN2015083760-appb-000015
The element on the first side, without the order value; the vector
Figure PCTCN2015083760-appb-000016
The first syntax element on the second side is marked as
Figure PCTCN2015083760-appb-000017
Will be vectored as described above
Figure PCTCN2015083760-appb-000018
The syntactic vector part of the annotation, denoted as the iris syntax vector
Figure PCTCN2015083760-appb-000019
After the order value is marked, take the jth syntax element in the aforementioned tail vector, and construct a unique gap only on the first side of the element; after the empty, take an unused second type of syntax vector
Figure PCTCN2015083760-appb-000020
Syntactic vector in the form of overall insertion
Figure PCTCN2015083760-appb-000021
Insert the constructed vacancy, and then generate a new syntax vector, then record the newly generated syntax vector as
Figure PCTCN2015083760-appb-000022
or
第三类句法向量
Figure PCTCN2015083760-appb-000023
按照预定方向,对句法向量
Figure PCTCN2015083760-appb-000024
中的每一个句法元素全都标注顺序值;标注句法元素的顺序值之后,任取一个
Figure PCTCN2015083760-appb-000025
中的第t个句法元素,在该句法元素的第一侧构造唯一的空位;造空之后,任取一个未使用过得第二类句法向量
Figure PCTCN2015083760-appb-000026
以整体插空的方式将该向量
Figure PCTCN2015083760-appb-000027
插入前面构造的空位,进而生成一个新向量,则该新向量记为
Figure PCTCN2015083760-appb-000028
Third type of syntax vector
Figure PCTCN2015083760-appb-000023
Syntactic vector according to the predetermined direction
Figure PCTCN2015083760-appb-000024
Each syntax element in the label is labeled with a sequential value; after the order value of the syntax element is annotated, take one
Figure PCTCN2015083760-appb-000025
The tth syntax element in the construct, constructing a unique gap on the first side of the syntax element; after the void, taking an unused second type of syntax vector
Figure PCTCN2015083760-appb-000026
The vector is inserted as a whole
Figure PCTCN2015083760-appb-000027
Insert the previously constructed gap and generate a new vector, then the new vector is recorded as
Figure PCTCN2015083760-appb-000028
S5.5.4、重复执行S5.5.3,每当上一次造空和插空步骤结束的时候,对经过上一次造空和插空步骤而得到的第三类句法向量进行下一次的造空和插空操作,直至将所有第二类句法向量
Figure PCTCN2015083760-appb-000029
全部插空完毕,最后得到一个单行的第三类句法向量,将所述最后得到的第三类句法向量称为最终单行向量;
S5.5.4, repeated execution of S5.5.3, the next time the emptying and insertion of the third type of syntactic vector obtained through the previous emptying and insertion steps are performed at the end of the last emptying and emptying steps. Null operation until all second type of syntax vectors will be
Figure PCTCN2015083760-appb-000029
After all the insertions are completed, a third type of syntax vector of a single line is finally obtained, and the finally obtained third type of syntax vector is called a final single line vector;
S5.5.5、如果一个可能句法解析结构对应的所有所述最终单行向量中均存在两个位置逆反的顺序值,则排除该可能句法解析结构;S5.5.5, if there are two position reversal order values in all the final single row vectors corresponding to a possible syntax parsing structure, the possible syntactic parsing structure is excluded;
S5.5.6、重复执行S5.5.2至S5.5.5直至所有可能句法解析结构被遍历。S5.5.6, repeat S5.5.2 to S5.5.5 until all possible syntactic parsing structures are traversed.
进一步,S2包括生成并列名词代词组合向量族:Further, S2 includes generating a vector family of parallel noun pronouns:
S2.1选取不重复的两个名词代词单元:S2.1 selects two noun pronoun units that are not repeated:
A、如果介于这两个名词代词单元之间没有其他词单元,则将这两个名词代词单元作为一个并列名词代词组合向量,并保留该并列名词代词组合向量;A. If there are no other word units between the two noun pronoun units, the two noun pronoun units are used as a parallel noun pronoun combination vector, and the parallel noun pronoun combination vector is retained;
B、如果介于这两个名词代词单元之间存在其他词单元,则检查介于这两个名词代词单元之间的每一个词单元:如果介于这两个名词代词单元之间的任意一个词单元,全都是名词代词单元 或并列关联词单元,则将所选取的两个名词代词单元和介于这两个名词代词单元之间的全体词单元作为一个并列名词代词组合向量,并保留该并列名词代词组合向量;否则,不生成并列名词代词组合向量;B. If there are other word units between the two noun pronoun units, check each word unit between the two noun pronoun units: if any between the two noun pronoun units Word units, all of which are noun pronoun units Or juxtaposed the word unit, the selected two noun pronoun units and the whole word unit between the two noun pronoun units are used as a parallel noun pronoun combination vector, and the parallel noun pronoun combination vector is retained; otherwise, no Generating a parallel noun pronoun combination vector;
S2.2复执行S2.1直至所有的名词代词单元的组合方式被遍历,生成获得的所有的并列名词代词组合向量;S2.2 complex execution S2.1 until all combinations of noun pronoun units are traversed, and all obtained parallel noun pronoun combination vectors are generated;
S2.3如果该可能句法解析结构存在并列名词代词组合向量,则对所有的并列名词代词组合向量进行划分,从而形成若干个并列名词代词组合向量族,使得:在每一个并列名词代词组合向量族中,该并列名词代词组合向量族中所包含的每一个并列名词代词组合向量全都包含了两个共同的名词代词单元。S2.3 If there is a parallel noun pronoun combination vector in the possible syntactic parsing structure, all the parallel noun pronoun combination vectors are divided to form a plurality of parallel noun pronoun combination vector families, so that: in each parallel noun pronoun combination vector family Each collocated noun pronoun combination vector included in the parallel noun pronoun combination vector family all contains two common noun pronoun units.
S2.4在每一个名词代词组合向量族中,选取所有名词代词组合向量中所包含的编号最大的词单元,作为该名词代词组合向量族的最大词单元,以备后续生成主语时使用;选取所有名词代词组合向量中所包含的编号最小的词单元,作为该名词代词组合向量族的最小词单元,以备后续生成宾语时使用。S2.4 selects the largest number of word units contained in all noun pronoun combination vectors in each noun pronoun combination vector family, as the largest word unit of the noun pronoun combination vector family, for use in subsequent generation of the subject; The word unit with the lowest number included in all noun pronoun combination vectors is used as the smallest unit of the noun pronoun combination vector family, and is used for subsequent generation of the object.
进一步,生成对应的主语元素包括:Further, generating corresponding subject elements includes:
当对应的谓语动词单元编号是最小的谓语动词单元编号时,所述主语元素的可能取值为编号小于对应的谓语动词单元编号的名词代词单元之一,或其最大词单元的编号小于对应的谓语动词单元编号的全体并列名词代词组合向量族中所包含的并列名词代词组合向量之一,或空单元。When the corresponding predicate verb unit number is the smallest predicate verb unit number, the possible value of the subject element is one of the noun pronoun units whose number is smaller than the corresponding predicate verb unit number, or the number of the largest word unit is smaller than the corresponding One of the parallel noun pronoun combination vectors contained in the vector of the predicate verb unit number, or an empty unit.
当对应的谓语动词单元编号不是最小的谓语动词单元编号时,所述主语元素的可能取值为编号小于对应的谓语动词单元编号的名词代词单元之一,或最大词单元的编号小于对应的谓语动词单元编号的全体并列名词代词组合向量族中所包含的并列名词代词组合向量之一,或在前出现的谓语动词单元对应的句法向量之一,或空单元。When the corresponding predicate verb unit number is not the smallest predicate verb unit number, the possible value of the subject element is one of the noun pronoun units whose number is smaller than the corresponding predicate verb unit number, or the number of the largest word unit is smaller than the corresponding predicate The verb unit number is one of the collocated noun pronoun combination vectors contained in the collocation noun pronoun combination vector family, or one of the syntactic vowel units corresponding to the predicate verb unit, or an empty unit.
进一步,生成对应的宾语元素包括:Further, generating corresponding object elements includes:
当对应的谓语动词单元编号是最大的谓语动词单元编号时,所述宾语元素的可能取值为编号大于对应的谓语动词单元编号的名词代词单元之一,或其最小词单元的编号大于对应的谓语动词单元编号的全体并列名词代词组合向量族中所包含的并列名词代词组合向量之一,或空单元。When the corresponding predicate verb unit number is the largest predicate verb unit number, the possible value of the object element is one of the noun pronoun units whose number is greater than the corresponding predicate verb unit number, or the number of the smallest word unit is greater than the corresponding number. One of the parallel noun pronoun combination vectors contained in the vector of the predicate verb unit number, or an empty unit.
当对应的谓语动词单元编号不是最大的谓语动词单元编号时,所述宾语元素的可能取值为编号大于对应的谓语动词单元编号且小于相邻的在后出现的谓语动词单元编号的名词代词单元之一,或其最小词单元的编号大于对应的谓语动词单元编号且小于相邻的在后出现的谓语动词单元编号的全体并列名词代词组合向量族中所包含的并列名词代词组合向量之一,或在后出现的谓语动词单元对应的句法向量之一,或空单元。 When the corresponding predicate verb unit number is not the largest predicate verb unit number, the possible value of the object element is a noun pronoun unit whose number is greater than the corresponding predicate verb unit number and is smaller than the adjacent predicate verb unit number. One of the collocated noun pronoun combination vectors included in one of the collocation noun pronoun combination vector numbers, or one of the smallest word units, is greater than the corresponding predicate verb unit number and less than the adjacent predicate verb unit number. Or one of the syntactic vectors corresponding to the predicate verb unit that appears later, or an empty unit.
进一步,在S4和S5两个步骤中,利用与句法结构可能线性表达式解替代所述句法结构可能矩阵解;Further, in the two steps S4 and S5, the possible matrix solution may be replaced by a possible linear expression solution with a syntax structure;
所述句法结构可能线性表达式解与所述句法结构可能矩阵解等价;The syntactic structure may be equivalent to a linear expression solution of the syntactic structure;
所述句法结构可能线性表达式解包括由按照谓语动词单元编号顺序排列的句法向量表达式组成;每个所述句法向量表达式为对应的句法向量的引导语元素、主语元素、谓语元素、宾语元素按照顺序逐项偏加起来的表达式。The syntactic structure may be a linear expression solution comprising a syntactic vector expression arranged in order of predicate verb unit numbers; each of the syntactic vector expressions is a guide element, a subject element, a predicate element, an object of a corresponding syntax vector An expression in which elements are added one by one in order.
进一步,所述方法还包括:Further, the method further includes:
将句法结构解析结果中的各句法向量以及对应的句法结构关系用树状结构在人机交互界面中进行显示。Each syntax vector and corresponding syntax structure relationship in the syntax structure analysis result are displayed in a human-computer interaction interface by a tree structure.
本发明还提供一种基于计算机的自然语言句法结构解析的装置,包括:The invention also provides an apparatus for analyzing a natural language syntax structure based on a computer, comprising:
读取部件,用于读取待解析的经预处理的语句数据结构,所述经预处理的语句数据结构中仅包括语句的并列关联词单元、从属关联词单元、谓语动词单元、名词代词单元,且各词单元按照在所述经预处理的语句中的顺序进行编号,并标注类型;a reading component, configured to read a pre-processed statement data structure to be parsed, wherein the pre-processed statement data structure includes only a parallel-related word unit, a subordinate-related word unit, a predicate verb unit, a noun pronoun unit, and Each word unit is numbered in the order in the preprocessed statement, and the type is marked;
元素生成部件,用于对每一谓语动词单元,生成对应的引导语元素、主语元素、谓语元素和宾语元素;An element generating component, configured to generate a corresponding guide element, a subject element, a predicate element, and an object element for each predicate verb unit;
其中,所述引导语元素的可能取值为编号小于对应的谓语动词单元编号的并列关联词单元或从属关联词单元之一,或由一个编号小于对应的谓语动词单元编号的并列关联词单元和一个与其相邻且编号小于对应的谓语动词单元编号且编号大于该并列关联词单元编号的从属关联词单元构成的关联词组合向量之一,或空单元;Wherein, the possible value of the guide element is one of a parallel related word unit or a dependent related word unit whose number is smaller than the corresponding predicate verb unit number, or a parallel related word unit whose number is smaller than the corresponding predicate verb unit number and one of them One of the associated word combination vectors formed by the dependent-related word units whose neighbors are smaller than the corresponding predicate verb unit number and whose number is greater than the parallel-related word unit number, or an empty unit;
所述主语元素的可能取值为编号小于对应的谓语动词单元编号的名词代词单元之一,或最大词单元的编号小于对应的谓语动词单元编号的全体并列名词代词组合向量族中所包含的并列名词代词组合向量之一,或在前出现的谓语元素对应的句法向量之一,或空单元;The possible value of the subject element is one of the noun pronoun units whose number is smaller than the corresponding predicate verb unit number, or the number of the largest word unit is smaller than the juxtaposition included in the total parallel noun pronoun combination vector family of the corresponding predicate verb unit number. One of the noun pronoun combination vectors, or one of the syntactic vectors corresponding to the predicate element, or an empty unit;
所述谓语元素为对应的所述谓语动词单元;The predicate element is a corresponding predicate verb unit;
所述宾语元素的可能取值为编号大于对应的谓语动词单元编号且小于相邻的在后出现的谓语动词单元编号的名词代词单元之一,或最小词单元的编号大于对应的谓语动词单元编号且小于相邻的在后出现的谓语动词单元编号的全体并列名词代词组合向量族中所包含的并列名词代词组合向量之一,或在后出现的谓语元素对应的句法向量之一,或空单元;The possible value of the object element is one of the noun pronoun units whose number is greater than the corresponding predicate verb unit number and less than the adjacent predicate verb unit number, or the number of the smallest word unit is greater than the corresponding predicate verb unit number. And one of the parallel noun pronoun combination vectors included in the entire parallel noun pronoun combination vector family of adjacent predicate verb unit numbers, or one of the syntactic vectors corresponding to the predicate element, or an empty unit ;
向量生成部件,用于根据所述引导语元素、主语元素、谓语元素和宾语元素的可能取值,获取每一谓语动词单元对应的句法向量的所有可能取值,所述句法向量包括引导语元素、主语元素、谓语元素和宾语元素; a vector generating component, configured to obtain all possible values of a syntax vector corresponding to each predicate verb unit according to possible values of the guide element, the subject element, the predicate element, and the object element, where the syntax vector includes a guide element , subject elements, predicate elements, and object elements;
矩阵生成部件,用于根据所有句法向量的所有可能取值,生成至少一个句法结构可能矩阵解,所述句法结构可能矩阵解由按照谓语动词单元编号顺序排列的句法向量组成;a matrix generating component, configured to generate at least one syntax structure possible matrix solution according to all possible values of all syntax vectors, wherein the syntax structure may be composed of a syntax vector arranged according to a predicate verb unit number order;
求解部件,用于验证根据句法结构可能矩阵解得到的语句是否与所述经预处理的语句完全相同,如果完全相同,则将该句法结构可能矩阵解中的各句法向量作为句法结构解析结果之一;a solution component for verifying whether the statement obtained by the possible matrix solution according to the syntax structure is identical to the preprocessed statement, and if they are identical, each syntactic vector in the possible matrix solution of the syntax structure is used as a syntactic structure analysis result One;
其中,所述求解部件通过以下模块操作排除不符合条件的句法结构可能解:Wherein, the solving component excludes a possible syntactic structure solution by the following module operation:
第一排除模块,如果存在不在该句法结构可能矩阵解中出现的顺序值,则排除该句法结构可能矩阵解;a first exclusion module, if there is a sequence value that does not appear in the possible matrix solution of the syntax structure, the possible matrix solution is excluded from the syntax structure;
第二排除模块,如果在不同的句法向量中出现相同的顺序值或出现相同的句法向量,则排除该句法结构可能矩阵解;The second exclusion module excludes the possible matrix solution if the same sequence value appears in the different syntax vectors or the same syntax vector appears;
第三排除模块,在每一个可能矩阵解中,将与其他句法向量之间存在相互代入关系的句法向量全都进行等量代换,如果在等量代换之后出现两个句法向量的交叉矛盾,则排除该句法结构可能矩阵解;In the third exclusion module, in each possible matrix solution, the syntactic vectors having mutual substitution relations with other syntax vectors are all equally substituted, and if the cross-contradictions of the two syntax vectors appear after the equal-substitution, Excluding the syntactic structure possible matrix solution;
第四排除模块,在每一个可能矩阵解中,将与其他句法向量之间存在相互代入关系的句法向量全都进行等量代换,如果在等量代换之后出现两个位置逆反的顺序值,则排除该句法结构可能矩阵解;In the fourth exclusion module, in each possible matrix solution, the syntactic vectors having mutual substitution relations with other syntax vectors are all equally substituted, and if the order values of the two positions are reversed after the equal substitution, Excluding the syntactic structure possible matrix solution;
第五排除模块,在任意一个可能矩阵解中,如果存在与其他句法向量之间没有相互代入关系的句法向量,则执行插空操作以获得所有该可能矩阵解所对应的可能句法解析结构,并验证根据所述可能句法解析结构得到的语句是否与所述经预处理的语句完全相同,其进一步包括:a fifth exclusion module, in any one of the possible matrix solutions, if there is a syntax vector that has no substitution relationship with other syntax vectors, performing an interpolation operation to obtain a possible syntax parsing structure corresponding to all the possible matrix solutions, and Verification of whether the statement obtained according to the possible syntax parsing structure is identical to the preprocessed statement, further comprising:
第一子模块,先对该可能矩阵解中相互之间存在代入关系的句法向量进行等量代换,从而将该可能矩阵解转化为一组相互之间不存在代入关系的句法向量
Figure PCTCN2015083760-appb-000030
将可能矩阵解中的句法向量称为第一类句法向量,将转化得到的句法向量
Figure PCTCN2015083760-appb-000031
称为第二类句法向量;
The first sub-module first performs an equal substitution of the syntactic vectors in the possible matrix solutions with the substitution relationship between them, thereby transforming the possible matrix solutions into a set of syntactic vectors without substitution relations between them.
Figure PCTCN2015083760-appb-000030
The syntactic vector in the possible matrix solution is called the first kind of syntactic vector, and the transformed syntactic vector will be transformed.
Figure PCTCN2015083760-appb-000031
Called the second type of syntax vector;
第二子模块、任取一个第二类句法向量按照预定的方向逐一标注
Figure PCTCN2015083760-appb-000033
中的每一个句法元素的顺序值;标注句法元素的顺序值之后,任取
Figure PCTCN2015083760-appb-000034
中的第i个句法元素,仅在该句法元素的第一侧构造唯一的空位;造空之后,任取一个句法向量
Figure PCTCN2015083760-appb-000035
以外的第二类句法向量
Figure PCTCN2015083760-appb-000036
以整体插空的方式将句法向量
Figure PCTCN2015083760-appb-000037
插入所构造的空位,进而生成一个新的句法向量,将这个新句法向量记为
Figure PCTCN2015083760-appb-000038
并将整体插空而得到的句法向量,统称为第三类句法向量;
The second sub-module, taking a second type of syntax vector Mark one by one according to the predetermined direction
Figure PCTCN2015083760-appb-000033
The order value of each syntax element in the message; after appending the order value of the syntax element, take any
Figure PCTCN2015083760-appb-000034
The i-th syntax element in the construct, only a unique gap is constructed on the first side of the syntax element; after the void, take a syntax vector
Figure PCTCN2015083760-appb-000035
Second type of syntax vector
Figure PCTCN2015083760-appb-000036
Syntactic vector in the form of overall insertion
Figure PCTCN2015083760-appb-000037
Insert the constructed vacancy, and then generate a new syntax vector, record this new syntax vector as
Figure PCTCN2015083760-appb-000038
The syntactic vectors obtained by inserting the whole into space are collectively referred to as the third type of syntax vector;
第三子模块,对第三类句法向量
Figure PCTCN2015083760-appb-000039
按照预定的方向对从向量
Figure PCTCN2015083760-appb-000040
中的第一侧第一个句法元素开始到向量
Figure PCTCN2015083760-appb-000041
中包含的向量
Figure PCTCN2015083760-appb-000042
的第二侧第一个句法元素为止的每一个句法元素,全都标注顺序值;位于向量
Figure PCTCN2015083760-appb-000043
中包含的向量
Figure PCTCN2015083760-appb-000044
第一侧的元素, 不标注顺序值;将向量
Figure PCTCN2015083760-appb-000045
的第二侧的第一个句法元素记为
Figure PCTCN2015083760-appb-000046
将按照前述方式对向量
Figure PCTCN2015083760-appb-000047
标注的句法向量部分,记为甩尾句法向量
Figure PCTCN2015083760-appb-000048
标注顺序值之后,任取一个前述的甩尾向量中的第j个句法元素,仅在该元素第一侧构造唯一的空位;造空之后,任取一个未使用过的第二类句法向量
Figure PCTCN2015083760-appb-000049
以整体插空的方式将句法向量
Figure PCTCN2015083760-appb-000050
插入所构造的空位,进而生成一个新的句法向量,则将新生成的句法向量记为
Figure PCTCN2015083760-appb-000051
或者
Third submodule, the third type of syntax vector
Figure PCTCN2015083760-appb-000039
Pair vector from the predetermined direction
Figure PCTCN2015083760-appb-000040
The first syntactic element on the first side starts into the vector
Figure PCTCN2015083760-appb-000041
Vector contained in
Figure PCTCN2015083760-appb-000042
Each of the syntax elements up to the first syntactic element on the second side, all of which are labeled with a sequence value;
Figure PCTCN2015083760-appb-000043
Vector contained in
Figure PCTCN2015083760-appb-000044
The element on the first side, not the order value; the vector
Figure PCTCN2015083760-appb-000045
The first syntax element on the second side is marked as
Figure PCTCN2015083760-appb-000046
Will be vectored as described above
Figure PCTCN2015083760-appb-000047
The syntactic vector part of the annotation, denoted as the iris syntax vector
Figure PCTCN2015083760-appb-000048
After the order value is marked, take the jth syntax element in the aforementioned tail vector, and construct a unique gap only on the first side of the element; after the empty, take an unused second type of syntax vector
Figure PCTCN2015083760-appb-000049
Syntactic vector in the form of overall insertion
Figure PCTCN2015083760-appb-000050
Insert the constructed vacancy, and then generate a new syntax vector, then record the newly generated syntax vector as
Figure PCTCN2015083760-appb-000051
or
第三类句法向量
Figure PCTCN2015083760-appb-000052
按照预定方向,对句法向量
Figure PCTCN2015083760-appb-000053
中的每一个句法元素全都标注顺序值;标注句法元素的顺序值之后,任取一个
Figure PCTCN2015083760-appb-000054
中的第t个句法元素,在该句法元素的第一侧构造唯一的空位;造空之后,任取一个未使用过得第二类句法向量
Figure PCTCN2015083760-appb-000055
以整体插空的方式将该向量
Figure PCTCN2015083760-appb-000056
插入前面构造的空位,进而生成一个新向量,则该新向量记为
Figure PCTCN2015083760-appb-000057
Third type of syntax vector
Figure PCTCN2015083760-appb-000052
Syntactic vector according to the predetermined direction
Figure PCTCN2015083760-appb-000053
Each syntax element in the label is labeled with a sequential value; after the order value of the syntax element is annotated, take one
Figure PCTCN2015083760-appb-000054
The tth syntax element in the construct, constructing a unique gap on the first side of the syntax element; after the void, taking an unused second type of syntax vector
Figure PCTCN2015083760-appb-000055
The vector is inserted as a whole
Figure PCTCN2015083760-appb-000056
Insert the previously constructed gap and generate a new vector, then the new vector is recorded as
Figure PCTCN2015083760-appb-000057
第四子模块,重复第三子模块的操作,每当上一次造空和插空步骤结束的时候,对经过上一次造空和插空步骤而得到的第三类句法向量进行下一次的造空和插空操作,直至将所有第二类句法向量
Figure PCTCN2015083760-appb-000058
全部插空完毕,最后得到一个单行的第三类句法向量,将所述最后得到的第三类句法向量称为最终单行向量;
The fourth sub-module repeats the operation of the third sub-module, and each time the last nulling and emptying step ends, the third type of syntactic vector obtained through the last emptying and emptying steps is made for the next time. Empty and insert operations until all second type of syntax vectors will be
Figure PCTCN2015083760-appb-000058
After all the insertions are completed, a third type of syntax vector of a single line is finally obtained, and the finally obtained third type of syntax vector is called a final single line vector;
第五子模块,如果一个可能句法解析结构对应的所有所述最终单行向量中均存在两个位置逆反的顺序值,则排除该可能句法解析结构;a fifth submodule, if there are two position reversal order values in all of the final single row vectors corresponding to a possible syntactic parsing structure, the possible syntactic parsing structure is excluded;
第六子模块,重复调用第二子模块至第五子模块的操作直至所有可能句法解析结构被遍历。The sixth sub-module repeatedly calls the operations of the second sub-module to the fifth sub-module until all possible syntactic parsing structures are traversed.
进一步,还包括:Further, it also includes:
结果显示部件,将句法结构解析结果中的各句法向量以及对应的句法结构关系用树状结构进行在人机交互界面上进行显示。The result display component displays the syntax vector and the corresponding syntax structure relationship in the syntax structure analysis result on the human-computer interaction interface by using a tree structure.
附图说明DRAWINGS
通过以下参照附图对本发明实施例的描述,本发明的上述以及其他目的、特征和优点将更为清楚,在附图中:The above and other objects, features and advantages of the present invention will become more apparent from
图1是本发明实施例的基于计算机的自然语言句法结构解析的方法的流程图;1 is a flow chart of a method for analyzing a computer-based natural language syntax structure according to an embodiment of the present invention;
图2是本发明实施例的基于计算机的自然语言句法结构解析的装置的示意图。2 is a schematic diagram of an apparatus for analyzing a computer-based natural language syntax structure according to an embodiment of the present invention.
具体实施方式Detailed ways
以下基于优选实施例对本发明进行描述,但是本发明并不仅仅限于这些实施例。在下文对本发明的细节描述中,详尽描述了一些特定的细节部分。对本领域技术人员来说没有这些细节部分的描述也可以完全理解本发明。为了避免混淆本发明的实质,公知的方法、流程、元件和电路并没有详细叙述。 The invention is described below on the basis of preferred embodiments, but the invention is not limited to only these embodiments. In the following detailed description of the invention, some specific details are described in detail. The invention may be fully understood by those skilled in the art without a description of these details. In order to avoid obscuring the essence of the invention, well-known methods, procedures, components and circuits are not described in detail.
A部分 半群上的偏序关系和偏加法运算Partial partial order and partial addition on partial semigroups
A1部分 自然语言是词汇和标点符号集合上的自由幺半群Part A1 Natural language is a free semigroup of vocabulary and punctuation
依据抽象代数和计算语言学的理论,自然语言是词汇和标点符号集合上的自由幺半群。以下用英语为例进行说明,但是本领域技术人员容易理解,本发明的方法也适用于其他自然语言。According to the theory of abstract algebra and computational linguistics, natural language is a free unitary semigroup on vocabulary and punctuation. The following is explained by taking English as an example, but those skilled in the art will readily understand that the method of the present invention is also applicable to other natural languages.
给定一个集合A,A上的符号串是由A中的元素邻接而成的,邻接时可以重复,形成一个有限长的线性阵列。例如:从集合{a,b,c},可以形成符号串acbaab。这个符号串包含a的三次出现,b的两次出现,c的一次出现,它不同于符号串acaabb。虽然每个符号的出现次数相同,但它们的次序不同。可见,符号串是有序的。特别地,长度为0的符号串是0符号串,记为e。由此,对于给定的有限的符号集合A,A上长度为n的符号串就是从自然数集N到A的一个映射:f∶N→A。Given a set A, the symbol string on A is adjacency of the elements in A, and can be repeated in the adjacency to form a finite-length linear array. For example, from the set {a, b, c}, the symbol string acbaab can be formed. This string of symbols contains three occurrences of a, two occurrences of b, and one occurrence of c, which is different from the symbol string acaabb. Although each symbol appears the same number of times, their order is different. It can be seen that the symbol string is ordered. In particular, a symbol string of length 0 is a string of 0 symbols, denoted as e. Thus, for a given finite set of symbols A, the symbol string of length n on A is a mapping from the natural number set N to A: f: N → A.
从两个符号串出发,我们可以用邻接它们的办法构成新的符号串。例如,在符号串abac的右端邻接符号串bbac,便形成了新的符号串abacbbac。Starting with two symbol strings, we can construct a new symbol string by adjacency. For example, at the right end of the symbol string abac adjacent to the symbol string bbac, a new symbol string abacbbac is formed.
这种邻接符号串的运算称为:毗连运算,简称为毗连。The operation of this contiguous symbol string is called: contiguous operation, referred to as contiguous.
给定长度为n的符号串φ和长度为m的符号串ψ,其中:Given a symbol string φ of length n and a symbol string 长度 of length m, where:
φ={(1,x1),(2,x2),(3,x3),……,(n-1,xn-1),(n,xn)};Φ={(1,x 1 ), (2,x 2 ), (3,x 3 ), . . . , (n-1,x n-1 ), (n,x n )};
ψ={(1,y1),(2,y2),(3,y3),……,(m-1,ym-1),(m,ym)};ψ = {(1, y 1 ), (2, y 2 ), (3, y 3 ), ..., (m-1, y m-1 ), (m, y m )};
φ与ψ的毗连记为:φ^ψ。它是长度为n+m且由集合{(1,x1),(2,x2),(3,x3),……,(n-1,xn-1),(n,xn),(n+1,y1),(n+2,y2),……,(n+m,ym)}给出的符号串。那么,毗连便是定义在符号串上的一种二元运算,运算的结果是得到一个新的符号串。The connection between φ and ψ is: φ^ψ. It is of length n+m and consists of the set {(1,x 1 ),(2,x 2 ),(3,x 3 ),...,(n-1,x n-1 ),(n,x n ), (n+1, y 1 ), (n+2, y 2 ), ..., (n+m, y m )} are given symbol strings. Then, the contig is a binary operation defined on the symbol string, and the result of the operation is to get a new symbol string.
φ与ψ的毗连,还可省略毗连记号^,简化记为:φψ。The connection between φ and ψ can also omit the contiguous mark ^, which is simplified as: φ ψ.
则有:φ^ψ=φψ。Then there are: φ^ψ=φψ.
毗连运算是可结合的,因为对于任何符号串φ,ψ,ω,有:The contiguous operations are combinable because for any symbol string φ, ψ, ω, there are:
φ^(ψ^ω)=(φ^ψ)^ωΦ^(ψ^ω)=(φ^ψ)^ω
现有的每个英语单词和英文标点符号都定义成一个符号,那么S中所有单词和标点符号的集合A={a1,a2,a3,...,an}(n∈N)就是一个符号集。Each existing English word and English punctuation mark is defined as a symbol, then the set of all words and punctuation marks in S A={a 1 , a 2 , a 3 ,..., a n }(n∈N ) is a set of symbols.
任给的一个由英语单词和英文标点符号组成的有限长的符号串b1b2......bk(k∈N),称为词单元或连续词串。对于任给的一个词单元a=b1b2......bm(m∈N),称a是由A中元素组成的词 单元,当且仅当,b1,b2,...,bm∈A。Any given finite-length symbol string b 1 b 2 ... b k (k∈N) consisting of English words and English punctuation marks, is called a word unit or a continuous word string. For a given word unit a=b 1 b 2 ... b m (m∈N), a is said to be a word unit consisting of elements in A, if and only if, b 1 , b 2 , ..., b m ∈A.
长度为0的唯一的词单元称为空单元,记为e。A unique unit of word of length 0 is called an empty unit and is denoted as e.
记由A中元素组成的全体词单元(连续词串)的集合为As,设语句S=a1a2a3......an,其中,an为构成语句的词单元。代数系统(As,^,e)是英语单词和标点符号集合A上的自由幺半群。The set of all word units (continuous word strings) composed of elements in A is A s , and the statement S = a 1 a 2 a 3 ... a n , where a n is the word unit constituting the sentence . The algebraic system (A s , ^, e) is a free monoid on the English word and punctuation set A.
各词单元按照其在语句中的顺序依次排列,其下标为顺序编号,记τ(α)为词单元α在句子S中的编号。The word units are arranged in order according to their order in the sentence, the subscripts are sequentially numbered, and τ(α) is the number of the word unit α in the sentence S.
构造一个句法成分顺序映射ω,ω的条件如下:The conditions for constructing a syntactic component order map ω, ω are as follows:
(1)ω:{a1,a2,a3,……,an}→N,N为自然数集;(1) ω: {a 1 , a 2 , a 3 , ..., a n } → N, N is a natural number set;
(2)对任意一个ai,ai∈S,都有:ω(ai)=T(ai)。(2) For any one of a i , a i ∈S, there is: ω(a i )=T(a i ).
显然,ω是一个单映射。Obviously, ω is a single mapping.
A2部分 定义一种偏序关系Part A2 defines a partial order relationship
同时,对于代数系统(As,^,e),定义二元关系<At the same time, for algebraic systems (A s , ^, e), define a binary relationship < :
对于As中任意的词单元α,β∈As,称α<β,当且仅当α,β的编号τ(α),τ(β)满足:τ(α)<τ(β)。For any word unit α in A s , β∈A s , called α< β, if and only if α, β number τ(α), τ(β) satisfies: τ(α)<τ(β) .
依定义,二元关系<满足如下条件:By definition, the binary relationship < satisfies the following conditions:
(1)任给a∈As,都有a≮a;(1) Give a∈A s with a≮ a;
(2)对于任何a,b,c∈As,如果a<b,则b≮a;(2) For any a, b, c ∈ A s , if a < b, then b ≮ a;
(3)对于任何a,b,c∈As,如果a<b且b<c,则a<c。(3) For any a, b, c ∈ A s , if a < b and b < c, then a < c.
则依据严格偏序关系的定义,二元关系<是严格偏序关系。According to the definition of strict partial order relationship, the binary relationship < is strictly partial order relationship.
A3部分 定义偏加法运算和句法顺序值Part A3 defines partial addition and syntactic order values
同时,在代数系统(As,^,e)上,定义一个新的二元运算+<。称+<为定义在As中的严格偏序关系<上的偏加法运算,简称偏加,它满足如下特性:对于任何a,b∈As,如果a<b,则有a+<b=a^b=ab。At the same time, on the algebraic system (A s , ^, e), define a new binary operation + <. Let +< be the partial addition operation defined on the strict partial order relationship < in A s , abbreviated as partial addition, which satisfies the following characteristics: for any a, b∈A s , if a< b, then a+< b=a^b=ab.
我们可以确定:对于任何a,b∈As,如果a<b,则有偏加法运算+<和毗连运算^等价。偏加 法运算+<,可以看作是限制在严格偏序关系<上的毗连运算。We can determine: for any a, b ∈ A s , if a < b, then there is a partial addition + < and the contiguous operation ^ equivalent. The partial addition operation +< can be regarded as a contiguous operation restricted to the strict partial order relationship < .
任一自然语言的语句S都可以看作由每个词单元依据严格偏序关系<连接而成的词串公式,即:S=a1+<a2+<a3+<…+<an。这一特点,对于展开数学处理非常有利。Any natural language sentence S can be regarded as a word string formula which is connected by each word unit according to a strict partial order relationship < , namely: S=a 1 +<a 2 +<a 3 +<...+< a n . This feature is very beneficial for the development of mathematical processing.
在原句S中,按照从左至右的顺序,一次性地从句首至句尾,为全句中相邻的n个连续词串α1,α2,...,αn标注顺序号:1,2,......,n。In the original sentence S, in order from left to right, from the beginning of the sentence to the end of the sentence, the sequence number of the adjacent n consecutive words α 1 , α 2 , ..., α n in the whole sentence is marked: 1,2,...,n.
在一次确定的、如上所述的标注中,记任意给定的一个连续词串α的顺序号为τ(α),则称τ(α)为α的从左至右顺序值。即,任给一个原句S中的句法元素γ,将句法元素γ在原句S中的句法顺序值记为τ(γ)。In a determined label as described above, the sequence number of any given continuous string α is τ(α), and τ(α) is called the left-to-right order value of α. That is, the syntax element γ in one original sentence S is given, and the syntax order value of the syntax element γ in the original sentence S is denoted as τ(γ).
B部分 技术方案详述Part B Technical details
B1部分 对语言信息的初步分类Part B1 Initial classification of language information
在本发明中,将构成语句的词单元ai认定为常量。词单元ai具有其语言属性。构成核心句子结构的词单元可分为并列关联词单元、从属关联词单元、谓语动词单元、名词代词单元4种类型。每个词单元包括至少一个自然语言词汇,其可以是词、特定结构的短语或多个同属性词的并列。In the present invention, the word unit a i constituting the sentence is recognized as a constant. The word unit a i has its language attribute. The word units constituting the core sentence structure can be divided into four types: a parallel related word unit, a dependent related word unit, a predicate verb unit, and a noun pronoun unit. Each word unit includes at least one natural language vocabulary, which may be a word, a phrase of a particular structure, or a juxtaposition of multiple synonyms.
对于并列关联词单元,其可以是连接并列句和并列句法成分的并列连词and,but,or,so,yet等。For the side-by-side related word unit, it may be a parallel conjunction of the parallel sentence and the parallel syntax component and, but, or, so, yet.
对于从属关联词单元,其可以是引导从句的连接代词或连接副词和引导从句的连接短语,对于典型的引导词列举如下:that,what,which,who,whom,wherever,whenever,whose,where,when,why,how,whoever,whichever,while,whether,because,before,after,whatever,whomever,as,if,once,until,though,unless,although,no matter what,no matter who,no matter whom,no matter which,in that,in order that,as though,as if,even though,even if,so that等。其主要包括:由单词充当引导从句的关联词单元,由短语充当引导从句的关联词单元,连接并列句与并列句的关联词单元。For a dependent related word unit, it can be a connected pronoun of a leading clause or a connecting phrase of a leading adverb and a guiding clause. For a typical guiding word, the following are listed: that, what, which, who, who, wherever, when, whoe, where, when ,why,how,whoever,whichever,while,whether,because,before,after,whatever,weverever,as,if,once,until,though,unless,although,no matter what,no matter who,no matter whom,no Matter which, in that, in order that, as though, as if, even though, even if, so that, etc. It mainly includes: a word unit that serves as a guiding clause by a word, a related word unit that serves as a guiding clause by a phrase, and a related word unit that connects a parallel sentence and a parallel sentence.
对于谓语动词单元,其也可以是动词或动词短语,例如,can do,do。谓语被定义为英语中一个自然句里的主要动作语。结构上通常由两个部分构成:辅助动词+实义动词(主系表结构除外)。谓语有时态和语态的格式要求,用计算语言学的公式定义如下:For a predicate verb unit, it can also be a verb or a verb phrase, for example, can do, do. The predicate is defined as the main action language in a natural sentence in English. The structure usually consists of two parts: the auxiliary verb + the real verb (except the main table structure). The format requirements for predicate states and voices are defined by the formula of computational linguistics as follows:
Figure PCTCN2015083760-appb-000059
Figure PCTCN2015083760-appb-000059
Figure PCTCN2015083760-appb-000060
Figure PCTCN2015083760-appb-000060
对于名词代词单元,可以是:纯粹的名词短语(不包含在介宾短语中的名词短语),名词化的动词短语(名词化的动词短语定义:具有名词性质的、可以充当主语或宾语这类名词性句法成分的动词短语,包括:不定式短语和动名词短语两大类),可以单独使用的代词。名词代词单元举例如下:food,wolf,the men,me,it,this,to do等。For the noun pronoun unit, it can be: a pure noun phrase (noun phrase not included in the prepositional phrase), a nominalized verb phrase (nominalized verb phrase definition: having the nature of a noun, can act as a subject or an object) Verb phrases of nominal syntactic components, including: indefinite phrases and gerund phrases, and pronouns that can be used alone. Examples of noun pronouns are as follows: food, wolf, the men, me, it, this, to do, etc.
名词化的动词短语有格式要求,用计算语言学的公式定义如下:The nominal verb phrase has a format requirement, and the formula for computational linguistics is defined as follows:
11 To+VBTo+VB 77 RB+To+VBRB+To+VB
22 To+VB+VBNTo+VB+VBN 88 RB+To+VB+VBNRB+To+VB+VBN
33 To+VB+VBN+VBNTo+VB+VBN+VBN 99 RB+To+VB+VBN+VBNRB+To+VB+VBN+VBN
44 VBGVBG 1010 RB+VBGRB+VBG
55 VBG+VBNVBG+VBN 1111 RB+VBG+VBNRB+VBG+VBN
66 VBG+VBN+VBNVBG+VBN+VBN 1212 RB+VBG+VBN+VBNRB+VBG+VBN+VBN
{重要符号的说明} {Explanation of important symbols}
rr 谓语动词单元Predicate verb unit
kk 当前正在处理的谓语动词单元的顺序数The order number of the predicate verb units currently being processed
LeadLead 从属关联词单元Subordinate unit
NPINPI 纯粹名词单元Pure noun unit
ConjConj 并列关联词单元Parallel word unit
VNPVNP 名词性质的动词单元Noun-like verb unit
NOMPNOMP 主格代词单元Subject pronoun unit
OBJPOBJP 宾格代词单元Binger pronoun unit
NPNP 名词代词单元的统称General term for noun pronoun unit
在上述的词单元列表中,词单元的集合之间有如下关系:In the above word unit list, the set of word units has the following relationship:
{NP}={NPI}∪{VNP}∪{NOMP}∪{OBJP}。{NP}={NPI}∪{VNP}∪{NOMP}∪{OBJP}.
B2部分 定义重要的概念Part B2 defines important concepts
说明:自然语言语句的分句的定义如下:分句就是简单句,即自然语言的最基础句式。一个分句,就是一套主谓搭配结构。以上三类词单元构成自然语言语句分句的主干,其中,谓语动词单元充当谓语,而名词代词单元充当主语或宾语。Description: The definition of a clause in a natural language statement is as follows: A clause is a simple sentence, that is, the most basic sentence of natural language. A clause is a set of subject-predicate collocation structure. The above three types of word units constitute the backbone of natural language sentence clauses, wherein the predicate verb unit acts as a predicate, and the noun pronoun unit acts as a subject or object.
在本发明中,定义变量为x,y,z,其中x为引导语元素,y为主语元素,z为宾语元素,同时,记r为谓语元素,则每一个语句中的主谓搭配结构可以表示为:In the present invention, the variables are defined as x, y, z, where x is the leader element, y is the subject element, z is the object element, and at the same time, r is the predicate element, then the subject-predicate structure in each statement can Expressed as:
f(x,y,r,z)=x+<Λ+<y+<σ+<r+<ρ+<z+<μf(x,y,r,z)=x+<Λ+<y+<σ+<r+<ρ+<z+<μ
Λ,σ,ρ,μ分别表示x,y,r,z之外的任何一种成分或标点符号,简称为杂质,通过现有的语句预处理技术可以将杂质除去。可以将除去杂质后的函数f(x,y,r,z)=x+<y+<r+<z。用向量(x,y,r,z)的方式表示。Λ, σ, ρ, μ respectively represent any component or punctuation other than x, y, r, z, referred to as impurities, and the impurities can be removed by the existing sentence pretreatment technique. The function f(x, y, r, z) after the removal of impurities can be = x + < y + < r + < z. Expressed in the form of vectors (x, y, r, z).
引导语元素x为简单句的一个成分:简单句为从句时,引导语元素为引导从句的连接代词或 连接副词、引导从句的连接短语;简单句为并列句时,引导语元素为将该并列句与在前的其他并列句连接的并列连词。即,在一个简单句中,引导语元素x为由关联词单元构成的、用于引导后续简单句的句法成分。The leader element x is a component of a simple sentence: when the simple sentence is a clause, the leader element is a connected pronoun of the leading clause or A connecting phrase that connects an adverb and a leading clause; when the simple sentence is a parallel sentence, the leading element is a parallel conjunction connecting the parallel sentence with other preceding parallel sentences. That is, in a simple sentence, the leader element x is a syntactic component composed of related word units for guiding subsequent simple sentences.
如果当前正在处理S中的一个函数f,则记这个当前的函数f为fk;记当前正在处理的谓语动词单元的顺序数为k。(k∈N,N是自然数集,k≤n)If a function f in S is currently being processed, then the current function f is denoted as f k ; the order of the predicate verb units currently being processed is k. (k∈N, N is a natural number set, k≤n)
B3部分生成三个关键集合:{xk},{yk},{zk}The B3 part generates three key sets: {x k }, {y k }, {z k }
B 3.1部分生成{xk}B 3.1 part generation {x k }
[B 3.1.1]预备工作:定义如下子集合:[B 3.1.1] Preparatory work: Define the following subcollections:
1)Leadk={Lead|Lead<rk};1) Lead k = {Lead|Lead< r k };
2)conjk={conj|conj<rk};2) conj k = {conj|conj< r k };
3)(conjkοLeadk)=3) (conj k οLead k )=
{Rk|Rk=conj+<Lead,conj<rk,Lead<rk,τ(Lead)=τ(conj)+1};{R k |R k =conj+<Lead,conj< r k ,Led< r k ,τ(Lead)=τ(conj)+1};
[B 3.1.2]{xk}的生成算法:[B 3.1.2]{x k } generation algorithm:
{xk}=Leadk∪conjk∪(conjkοLeadk)∪{e}。{x k }=Lead k ∪conj k ∪(conj k οLead k )∪{e}.
B 3.2部分关于并列句法成分的生成方法的专门说明(以并列主语和并列宾语为例)Part 3.2 of 3.2 Particular Description of the Method of Generating Parallel Syntactic Components (Taking Parallel Subjects and Parallel Objects as Examples)
[B 3.2.1]直观描述[B 3.2.1]Intuitive description
说明:在下面的叙述中,为了便于表达,将连续词串的公式Фt或
Figure PCTCN2015083760-appb-000061
中包含句法元素
Figure PCTCN2015083760-appb-000062
记为
Figure PCTCN2015083760-appb-000063
Description: In the following description, for the convenience of expression, the formula of the continuous string is Фt or
Figure PCTCN2015083760-appb-000061
Contains syntax elements
Figure PCTCN2015083760-appb-000062
Recorded as
Figure PCTCN2015083760-appb-000063
第1步step 1
取原句中的全体名词性质的词组,将原句中的全体名词性质的词组编为一个集合,记为集合Ψ={α1,......,αm-1,αm},m∈N,m是集合Ψ中的元素的个数。Take the phrase of all nouns in the original sentence, and group the phrases of all nouns in the original sentence into a set, which is recorded as a set Ψ={α 1 ,...,α m-1m } , m∈N,m is the number of elements in the collection.
第2步Step 2
按照
Figure PCTCN2015083760-appb-000064
的方式,取集合Ψ={α1,......,αm-1,αm}中的任意两个元素的全部组合,
Figure PCTCN2015083760-appb-000065
设集合
Figure PCTCN2015083760-appb-000066
Figure PCTCN2015083760-appb-000067
according to
Figure PCTCN2015083760-appb-000064
The way, take all combinations of any two elements in the set Ψ = {α 1 , ..., α m-1 , α m },
Figure PCTCN2015083760-appb-000065
Set collection
Figure PCTCN2015083760-appb-000066
Figure PCTCN2015083760-appb-000067
第3步Step 3
对任给的一个
Figure PCTCN2015083760-appb-000068
按照元素
Figure PCTCN2015083760-appb-000069
在原句S中的句法顺序值的从小到大排列。则不妨设
Figure PCTCN2015083760-appb-000070
则可获得有序对
Figure PCTCN2015083760-appb-000071
建立一个连续词串的公式
Figure PCTCN2015083760-appb-000072
其中
Figure PCTCN2015083760-appb-000073
是原句S中的从
Figure PCTCN2015083760-appb-000074
Figure PCTCN2015083760-appb-000075
的一组相邻的连续词串或空词串。穷尽这样的有序对和连续词串公式。
One of the given
Figure PCTCN2015083760-appb-000068
By element
Figure PCTCN2015083760-appb-000069
The syntactic order values in the original sentence S are arranged from small to large. You may wish to set
Figure PCTCN2015083760-appb-000070
Orderly pair
Figure PCTCN2015083760-appb-000071
Formulating a continuous string of words
Figure PCTCN2015083760-appb-000072
among them
Figure PCTCN2015083760-appb-000073
Is the slave in the original sentence S
Figure PCTCN2015083760-appb-000074
To
Figure PCTCN2015083760-appb-000075
A set of adjacent consecutive word strings or empty word strings. Exhaustion of such ordered pairs and continuous word string formulas.
第4步Step 4
对公式Фt进行检查,如果对于公式Фt中的任给的介于元素
Figure PCTCN2015083760-appb-000076
Figure PCTCN2015083760-appb-000077
之间的元素γ,都有:γ或者是名词性质的词组,或者是并列连接词,或者是空串,则将Фt的标记改为
Figure PCTCN2015083760-appb-000078
称为Фt生成
Figure PCTCN2015083760-appb-000079
设集合
Figure PCTCN2015083760-appb-000080
Figure PCTCN2015083760-appb-000081
Check the formula Ф t if it is between the elements in the formula Ф t
Figure PCTCN2015083760-appb-000076
with
Figure PCTCN2015083760-appb-000077
The element γ between them has: γ or a noun-like phrase, or a parallel conjunction, or an empty string, then change the mark of Ф t to
Figure PCTCN2015083760-appb-000078
Ф t generation
Figure PCTCN2015083760-appb-000079
Set collection
Figure PCTCN2015083760-appb-000080
then
Figure PCTCN2015083760-appb-000081
第5步Step 5
任取集合
Figure PCTCN2015083760-appb-000082
如果集合
Figure PCTCN2015083760-appb-000083
存在对应的
Figure PCTCN2015083760-appb-000084
则定义一个集合族,该集合族由包含集合
Figure PCTCN2015083760-appb-000085
的全体集合构成。将该集合族记为如下表达形式:
Figure PCTCN2015083760-appb-000086
Any collection
Figure PCTCN2015083760-appb-000082
If collection
Figure PCTCN2015083760-appb-000083
Corresponding
Figure PCTCN2015083760-appb-000084
Defining a collection family that contains collections
Figure PCTCN2015083760-appb-000085
The whole collection of the composition. The collection family is recorded as the following expression:
Figure PCTCN2015083760-appb-000086
第6步Step 6
如果集合
Figure PCTCN2015083760-appb-000087
存在对应的集合族
Figure PCTCN2015083760-appb-000088
将任意一个集合族
Figure PCTCN2015083760-appb-000089
中的每一个集合下辖的句法元素全都取出,列为集合
Figure PCTCN2015083760-appb-000090
Figure PCTCN2015083760-appb-000091
If collection
Figure PCTCN2015083760-appb-000087
There is a corresponding collection family
Figure PCTCN2015083760-appb-000088
Will any one of the collection families
Figure PCTCN2015083760-appb-000089
The syntactic elements of each collection in the collection are all taken out, listed as collections
Figure PCTCN2015083760-appb-000090
Figure PCTCN2015083760-appb-000091
第7步Step 7
分别取出集合
Figure PCTCN2015083760-appb-000092
中在原句S中的句法顺序值的最大和最小的元素。
Extract the collection separately
Figure PCTCN2015083760-appb-000092
The largest and smallest element of the syntactic order value in the original sentence S.
注:该方法也可用于生成其他类型的并列成分,例如生成并列的形容词词组。只要将该方法中的全体NPI、全体VNP、全体NOMP词组,将原句中的全体NPI、全体VNP、全体NOMP词组替换成相应的句法成分,即可得到。Note: This method can also be used to generate other types of side-by-side components, such as generating side-by-side adjective phrases. As long as the entire NPI, the entire VNP, and the entire NOMP phrase in the method are replaced by the entire NPI, the entire VNP, and the entire NOMP phrase in the original sentence, the corresponding syntax component can be obtained.
[B 3.2.2]形式化定义[B 3.2.2] Formal definition
定义:一元函数A(S),A(S)表示取出原句S中的全体NPI词组、全体VNP词组、全体NOMP词组,同时,将原句中的全体NPI词组、全体VNP词组、全体NOMP词组列为一个集合,将该集合记为Ψ={α1,...,αm-1,αm},m∈N,m是集合Ψ中的元素的个数。则A(S)=Ψ={α1,...,αm-1,αm}。Definition: Unary function A(S), A(S) indicates that all NPI phrases, all VNP phrases, and all NOMP phrases in the original sentence S are taken out, and all NPI phrases, all VNP phrases, and all NOMP phrases in the original sentence are also taken. Listed as a set, the set is denoted by Ψ = {α 1 , ..., α m-1 , α m }, m ∈ N, m is the number of elements in the set Ψ. Then A(S)=Ψ={α 1 ,...,α m-1m }.
定义:一元函数B(Ψ),B(Ψ)表示按照
Figure PCTCN2015083760-appb-000093
的方式取集合Ψ={α1,...,αm-1,αm}中任意两元素的全部组合,
Figure PCTCN2015083760-appb-000094
设集合
Figure PCTCN2015083760-appb-000095
Figure PCTCN2015083760-appb-000096
Figure PCTCN2015083760-appb-000097
将任给的一个
Figure PCTCN2015083760-appb-000098
记为
Figure PCTCN2015083760-appb-000099
Figure PCTCN2015083760-appb-000100
Figure PCTCN2015083760-appb-000101
Figure PCTCN2015083760-appb-000102
Definition: unary function B (Ψ), B (Ψ) means according to
Figure PCTCN2015083760-appb-000093
The way to take all combinations of any two elements in the set Ψ={α 1 ,...,α m-1m },
Figure PCTCN2015083760-appb-000094
Set collection
Figure PCTCN2015083760-appb-000095
Figure PCTCN2015083760-appb-000096
then
Figure PCTCN2015083760-appb-000097
One that will be given
Figure PCTCN2015083760-appb-000098
Recorded as
Figure PCTCN2015083760-appb-000099
then
Figure PCTCN2015083760-appb-000100
then
Figure PCTCN2015083760-appb-000101
Figure PCTCN2015083760-appb-000102
定义:二元函数K(α,β),K(α,β)表示对一元函数B(Ψ)的结果,即对任给的一个
Figure PCTCN2015083760-appb-000103
Figure PCTCN2015083760-appb-000104
按照元素
Figure PCTCN2015083760-appb-000105
在原句S中的句法顺序值的从小到大排列。则不妨设
Figure PCTCN2015083760-appb-000106
则可得到有序对
Figure PCTCN2015083760-appb-000107
设集合
Figure PCTCN2015083760-appb-000108
进而建立一个连续词串公式
Figure PCTCN2015083760-appb-000109
其中
Figure PCTCN2015083760-appb-000110
是原句S中的从
Figure PCTCN2015083760-appb-000111
Figure PCTCN2015083760-appb-000112
的一组相邻的连续词串或空词串,且
Figure PCTCN2015083760-appb-000113
Figure PCTCN2015083760-appb-000114
Definition: The binary function K(α, β), K(α, β) represents the result of the unary function B(Ψ), that is, the given one
Figure PCTCN2015083760-appb-000103
Figure PCTCN2015083760-appb-000104
By element
Figure PCTCN2015083760-appb-000105
The syntactic order values in the original sentence S are arranged from small to large. You may wish to set
Figure PCTCN2015083760-appb-000106
Orderly pair
Figure PCTCN2015083760-appb-000107
Set collection
Figure PCTCN2015083760-appb-000108
Then establish a continuous string formula
Figure PCTCN2015083760-appb-000109
among them
Figure PCTCN2015083760-appb-000110
Is the slave in the original sentence S
Figure PCTCN2015083760-appb-000111
To
Figure PCTCN2015083760-appb-000112
a set of adjacent consecutive or empty words, and
Figure PCTCN2015083760-appb-000113
then
Figure PCTCN2015083760-appb-000114
定义:一元函数H(Фt),H(Фt)表示对二元函数K(α,β)生成的
Figure PCTCN2015083760-appb-000115
Figure PCTCN2015083760-appb-000116
进行检查:如果对任给的元素γ∈Фt,且
Figure PCTCN2015083760-appb-000117
Figure PCTCN2015083760-appb-000118
都有:γ=NPI或γ=VNP或γ=NOMP或γ=CONJ或γ=e,则将Фt的标记改为
Figure PCTCN2015083760-appb-000119
称为Фt生成
Figure PCTCN2015083760-appb-000120
设集合
Figure PCTCN2015083760-appb-000121
Figure PCTCN2015083760-appb-000122
Definition: The unary function H(Ф t ), H(Ф t ) represents the generation of the binary function K(α, β)
Figure PCTCN2015083760-appb-000115
Figure PCTCN2015083760-appb-000116
Check: if the given element is γ∈Ф t , and
Figure PCTCN2015083760-appb-000117
And
Figure PCTCN2015083760-appb-000118
Both: γ=NPI or γ=VNP or γ=NOMP or γ=CONJ or γ=e, then change the mark of Ф t to
Figure PCTCN2015083760-appb-000119
Ф t generation
Figure PCTCN2015083760-appb-000120
Set collection
Figure PCTCN2015083760-appb-000121
then
Figure PCTCN2015083760-appb-000122
定义:二元函数M(α,β),M(α,β)表示对于任取的一个集合
Figure PCTCN2015083760-appb-000123
如果集合
Figure PCTCN2015083760-appb-000124
存在对应的
Figure PCTCN2015083760-appb-000125
则定义一个集合族,该集合族由包含集合
Figure PCTCN2015083760-appb-000126
的全体集合构成,该集合族记为
Figure PCTCN2015083760-appb-000127
Figure PCTCN2015083760-appb-000128
Figure PCTCN2015083760-appb-000129
Definition: The binary function M(α, β), M(α, β) represents a set for any
Figure PCTCN2015083760-appb-000123
If collection
Figure PCTCN2015083760-appb-000124
Corresponding
Figure PCTCN2015083760-appb-000125
Defining a collection family that contains collections
Figure PCTCN2015083760-appb-000126
The whole set of the composition, the collection family is recorded as
Figure PCTCN2015083760-appb-000127
then
Figure PCTCN2015083760-appb-000128
Figure PCTCN2015083760-appb-000129
定义:二元函数N(α,β),N(α,β)表示对二元函数M(α,β)的结果
Figure PCTCN2015083760-appb-000130
即对于任取集合
Figure PCTCN2015083760-appb-000131
如果集合
Figure PCTCN2015083760-appb-000132
存在对应的集合族
Figure PCTCN2015083760-appb-000133
则构造一个新的集合如下
Figure PCTCN2015083760-appb-000134
Figure PCTCN2015083760-appb-000135
Figure PCTCN2015083760-appb-000136
Definition: The binary function N(α, β), N(α, β) represents the result of the binary function M(α, β)
Figure PCTCN2015083760-appb-000130
That is, for any collection
Figure PCTCN2015083760-appb-000131
If collection
Figure PCTCN2015083760-appb-000132
There is a corresponding collection family
Figure PCTCN2015083760-appb-000133
Then construct a new collection as follows
Figure PCTCN2015083760-appb-000134
then
Figure PCTCN2015083760-appb-000135
Figure PCTCN2015083760-appb-000136
定义:一元函数u(α),u(α)表示对二元函数N(α,β)的结果
Figure PCTCN2015083760-appb-000137
Figure PCTCN2015083760-appb-000138
Figure PCTCN2015083760-appb-000139
对任给的元素γ,
Figure PCTCN2015083760-appb-000140
都有τ(γ)≤τ(δ)。则
Figure PCTCN2015083760-appb-000141
Definition: The unary function u(α), u(α) represents the result of the binary function N(α, β)
Figure PCTCN2015083760-appb-000137
take
Figure PCTCN2015083760-appb-000138
Assume
Figure PCTCN2015083760-appb-000139
For the given element γ,
Figure PCTCN2015083760-appb-000140
There are τ(γ) ≤ τ(δ). then
Figure PCTCN2015083760-appb-000141
定义:一元函数V(β),V(β)表示对二元函数N(α,β)的结果
Figure PCTCN2015083760-appb-000142
Figure PCTCN2015083760-appb-000143
Figure PCTCN2015083760-appb-000144
对任给的元素γ,
Figure PCTCN2015083760-appb-000145
都有τ(δ)≤τ(γ)。则
Figure PCTCN2015083760-appb-000146
Definition: The unary function V(β), V(β) represents the result of the binary function N(α, β)
Figure PCTCN2015083760-appb-000142
take
Figure PCTCN2015083760-appb-000143
Assume
Figure PCTCN2015083760-appb-000144
For the given element γ,
Figure PCTCN2015083760-appb-000145
There are τ(δ) ≤ τ(γ). then
Figure PCTCN2015083760-appb-000146
[B 3.2.3]并列主语的生成算法:[B 3.2.3] Parallel subject generation algorithm:
Figure PCTCN2015083760-appb-000147
Figure PCTCN2015083760-appb-000147
[B 3.2.4]并列宾语的生成算法: [B 3.2.4] Parallel object generation algorithm:
Figure PCTCN2015083760-appb-000148
Figure PCTCN2015083760-appb-000148
[B 3.2.5]对并列主语和并列宾语的生成算法的举例说明 [B 3.2.5] An example of the algorithm for generating parallel subjects and parallel objects
举例说明:词序列表为:For example: the word sequence table is:
原句短语Original sentence 短语类型Phrase type 顺序编号Sequence number
AfterAfter 从属关联词单元Subordinate unit 11
JackJack 名词代词单元Noun pronoun unit 22
MaryMary 名词代词单元Noun pronoun unit 33
andAnd 并列关联词单元Parallel word unit 44
LindaLinda 名词代词单元Noun pronoun unit 55
leftLeft 谓语动词单元Predicate verb unit 66
II 名词代词单元Noun pronoun unit 77
gaveGave 谓语动词单元Predicate verb unit 88
my sonMy son 名词代词单元Noun pronoun unit 99
a booka book 名词代词单元Noun pronoun unit 1010
在生成主语元素集合{y1}、{y2}的过程中,运行并列主语生成算法如下:In the process of generating the set of subject elements {y 1 }, {y 2 }, the algorithm for running the parallel subject generation is as follows:
①A(S)取出原句中的全体NPI词组、全体VNP词组、全体NOMP词组,并将原句中的全体NPI词组、全体VNP词组、全体NOMP词组列为一个集合,将该集合记为Ψ={Jack,Mary,Linda,I,my son,a book}={2,3,5,7,9,10}。1A(S) takes out all NPI phrases, all VNP phrases, and all NOMP phrases in the original sentence, and lists all NPI phrases, all VNP phrases, and all NOMP phrases in the original sentence as a set, and records the set as Ψ= {Jack, Mary, Linda, I, my son, a book}={2,3,5,7,9,10}.
②B(Ψ)表示按照
Figure PCTCN2015083760-appb-000149
的方式取集合Ψ={2,3,5,7,9,10}中任意两元素的全部组合,设集合
Figure PCTCN2015083760-appb-000150
Figure PCTCN2015083760-appb-000151
则B(Ψ)={{2,3},{2,5},{2,7},{2,9},{2,10},{3,5},{3,7},{3,9},{3,10},{5,7},{5,9},{5,10},{7,9},{10,7},{10,9}}。
2B (Ψ) means follow
Figure PCTCN2015083760-appb-000149
The way to take all combinations of any two elements in the set Ψ={2,3,5,7,9,10}, set the set
Figure PCTCN2015083760-appb-000150
Figure PCTCN2015083760-appb-000151
Then B(Ψ)={{2,3},{2,5},{2,7},{2,9},{2,10},{3,5},{3,7},{ 3,9},{3,10},{5,7},{5,9},{5,10},{7,9},{10,7},{10,9}}.
③K(α,β)对一元函数B(Ψ)的结果,即对任给的一个
Figure PCTCN2015083760-appb-000152
按照元素
Figure PCTCN2015083760-appb-000153
在原句S中的句法顺序值的从小到大排列。则不妨设
Figure PCTCN2015083760-appb-000154
则可得到有序对
Figure PCTCN2015083760-appb-000155
Figure PCTCN2015083760-appb-000156
生成的有序对是:
The result of 3K(α,β) versus unary function B(Ψ), that is, one given
Figure PCTCN2015083760-appb-000152
By element
Figure PCTCN2015083760-appb-000153
The syntactic order values in the original sentence S are arranged from small to large. You may wish to set
Figure PCTCN2015083760-appb-000154
Orderly pair
Figure PCTCN2015083760-appb-000155
then
Figure PCTCN2015083760-appb-000156
The generated ordered pairs are:
{<2,3>,<2,5>,<2,7>,<2,9>,<2,10>,<3,5>,<3,7>,<3,9>,<3,10>,<5,7>,<5,9>,<5,10>,<7,9>,<7,10>,<9,10>}。{<2,3>,<2,5>,<2,7>,<2,9>,<2,10>,<3,5>,<3,7>,<3,9>,< 3, 10>, <5, 7>, <5, 9>, <5, 10>, <7, 9>, <7, 10>, <9, 10>}.
设集合
Figure PCTCN2015083760-appb-000157
进而建立一个连续词串公式
Figure PCTCN2015083760-appb-000158
其中
Figure PCTCN2015083760-appb-000159
是原句S中的从
Figure PCTCN2015083760-appb-000160
Figure PCTCN2015083760-appb-000161
的一组相邻的连续词串或空词串,且
Figure PCTCN2015083760-appb-000162
Figure PCTCN2015083760-appb-000163
Figure PCTCN2015083760-appb-000164
则Ф1=2+<e+<3,Ф2=2+<3+<4+<5,Ф3=2+<3+<4+<5+<6+<7,Ф4=2+<3+<4+<5+<6+<7+<8+<9,Ф5=2+<3+<4+<5+<6+<7+<8+<9+<10,Ф6=3+<4+<5,Ф7=3+<4+<5+<6+<7,Ф8=3+<4+<5+<6+<7+<8+<9,Ф9=3+<4+<5+<6+<7+<8+<9+<10,Ф10=5+<6+<7,Ф11=5+<6+<7+<8+<9,Ф12=5+<6+<7+<8+<9+<10,Ф13=7+<8+<9,Ф14=7+<8+<9+<10,Ф15=9+<e+<10。
Set collection
Figure PCTCN2015083760-appb-000157
Then establish a continuous string formula
Figure PCTCN2015083760-appb-000158
among them
Figure PCTCN2015083760-appb-000159
Is the slave in the original sentence S
Figure PCTCN2015083760-appb-000160
To
Figure PCTCN2015083760-appb-000161
a set of adjacent consecutive or empty words, and
Figure PCTCN2015083760-appb-000162
then
Figure PCTCN2015083760-appb-000163
Figure PCTCN2015083760-appb-000164
Then Ф 1 = 2+ < e + < 3, Ф 2 = 2+ < 3 + < 4 + < 5, Ф 3 = 2+ < 3 + < 4 + < 5 + < 6 + < 7, Ф 4 = 2+ <3+<4+<5+<6+<7+<8+<9, Ф 5 =2+<3+<4+<5+<6+<7+<8+<9+<10, Ф 6 =3+<4+<5, Ф 7 =3+<4+<5+<6+<7, Ф 8 =3+<4+<5+<6+<7+<8+<9 , Ф 9 =3+<4+<5+<6+<7+<8+<9+<10, Ф 10 =5+<6+<7, Ф 11 =5+<6+<7+<8+<9, Ф 12 =5+<6+<7+<8+<9+<10, Ф 13 =7+<8+<9, Ф 14 =7+<8+<9+<10, Ф 15 =9+<e+<10.
④H(Фt)对二元函数K(α,β)生成的
Figure PCTCN2015083760-appb-000165
进行检查:如果对任给的元素γ∈Фt,且
Figure PCTCN2015083760-appb-000166
Figure PCTCN2015083760-appb-000167
都有:γ=NPI或γ=VNP或γ=NOMP或γ=CONJ或γ=e,则将Фt的标记改为
Figure PCTCN2015083760-appb-000168
称为Фt生成
Figure PCTCN2015083760-appb-000169
设集合
Figure PCTCN2015083760-appb-000170
则集合
Figure PCTCN2015083760-appb-000171
Figure PCTCN2015083760-appb-000172
4H(Ф t ) generated for the binary function K(α, β)
Figure PCTCN2015083760-appb-000165
Check: if the given element is γ∈Ф t , and
Figure PCTCN2015083760-appb-000166
And
Figure PCTCN2015083760-appb-000167
Both: γ=NPI or γ=VNP or γ=NOMP or γ=CONJ or γ=e, then change the mark of Ф t to
Figure PCTCN2015083760-appb-000168
Ф t generation
Figure PCTCN2015083760-appb-000169
Set collection
Figure PCTCN2015083760-appb-000170
Collection
Figure PCTCN2015083760-appb-000171
then
Figure PCTCN2015083760-appb-000172
⑤M(α,β)表示对于任取的一个集合
Figure PCTCN2015083760-appb-000173
如果集合
Figure PCTCN2015083760-appb-000174
存在对应的
Figure PCTCN2015083760-appb-000175
则定义一个集合族,该集合族由包含集合
Figure PCTCN2015083760-appb-000176
的全体集合构成,该集合族记为
Figure PCTCN2015083760-appb-000177
Figure PCTCN2015083760-appb-000178
Figure PCTCN2015083760-appb-000179
则M(α,β)={I1({2,3}),I2({3,5}),I3({9,10})}。
5M(α,β) represents a set for any
Figure PCTCN2015083760-appb-000173
If collection
Figure PCTCN2015083760-appb-000174
Corresponding
Figure PCTCN2015083760-appb-000175
Defining a collection family that contains collections
Figure PCTCN2015083760-appb-000176
The whole set of the composition, the collection family is recorded as
Figure PCTCN2015083760-appb-000177
then
Figure PCTCN2015083760-appb-000178
Figure PCTCN2015083760-appb-000179
Then M(α,β)={I 1 ({2,3}), I 2 ({3,5}), I 3 ({9,10})}.
⑥N(α,β)对二元函数M(α,β)的结果
Figure PCTCN2015083760-appb-000180
即对于任取集合
Figure PCTCN2015083760-appb-000181
Figure PCTCN2015083760-appb-000182
如果集合
Figure PCTCN2015083760-appb-000183
存在对应的集合族
Figure PCTCN2015083760-appb-000184
则构造一个新的集合如下
Figure PCTCN2015083760-appb-000185
则可得P[I1({2,3})]={2,3,4,5},P[I2({3,5})]={2,3,4,5},P[I3({9,10})]={9,10}。
6N(α,β) results for the binary function M(α,β)
Figure PCTCN2015083760-appb-000180
That is, for any collection
Figure PCTCN2015083760-appb-000181
Figure PCTCN2015083760-appb-000182
If collection
Figure PCTCN2015083760-appb-000183
There is a corresponding collection family
Figure PCTCN2015083760-appb-000184
Then construct a new collection as follows
Figure PCTCN2015083760-appb-000185
Then P[I 1 ({2,3})]={2,3,4,5}, P[I 2 ({3,5})]={2,3,4,5}, P [I 3 ({9,10})]={9,10}.
⑦u(α)对二元函数N(α,β)的结果
Figure PCTCN2015083760-appb-000186
Figure PCTCN2015083760-appb-000187
Figure PCTCN2015083760-appb-000188
对任给的元素γ,
Figure PCTCN2015083760-appb-000189
都有τ(γ)≤τ(δ)。则Pmax[I1({2,3})]=5,Pmax[I2({3,5})]=5,Pmax[I3({9,10})]=10。
7u(α) results for the binary function N(α,β)
Figure PCTCN2015083760-appb-000186
take
Figure PCTCN2015083760-appb-000187
Assume
Figure PCTCN2015083760-appb-000188
For the given element γ,
Figure PCTCN2015083760-appb-000189
There are τ(γ) ≤ τ(δ). Then P max [I 1 ({2, 3})] = 5, P max [I 2 ({3, 5})] = 5, P max [I 3 ({9, 10})] = 10.
在生成宾语元素集合{z1}、{z2}的过程中,运行宾列主语生成算法如下:In the process of generating the object element set {z 1 }, {z 2 }, the algorithm for running the guest column subject is as follows:
①A(S)取出原句中的全体NPI词组、全体VNP词组、全体OBJP词组,并将原句中的全体NPI词组、全体VNP词组、全体OBJP词组列为一个集合,将该集合记为Ψ={Jack,Mary,Linda,I,my son,a book}={2,3,5,7,9,10}。1A(S) takes out all NPI phrases, all VNP phrases, and all OBJP phrases in the original sentence, and lists all NPI phrases, all VNP phrases, and all OBJP phrases in the original sentence as a set, and records the set as Ψ= {Jack, Mary, Linda, I, my son, a book}={2,3,5,7,9,10}.
②B(Ψ)表示按照
Figure PCTCN2015083760-appb-000190
的方式取集合Ψ={2,3,5,7,9,10}中任意两元素的全部组合,设集合
Figure PCTCN2015083760-appb-000191
Figure PCTCN2015083760-appb-000192
则B(Ψ)={{2,3},{2,5},{2,7},{2,9},{2,10},{3,5},{3,7},{3,9},{3,10},{5,7},{5,9},{5,10},{7,9},{10,7},{10,9}}。
2B (Ψ) means follow
Figure PCTCN2015083760-appb-000190
The way to take all combinations of any two elements in the set Ψ={2,3,5,7,9,10}, set the set
Figure PCTCN2015083760-appb-000191
Figure PCTCN2015083760-appb-000192
Then B(Ψ)={{2,3},{2,5},{2,7},{2,9},{2,10},{3,5},{3,7},{ 3,9},{3,10},{5,7},{5,9},{5,10},{7,9},{10,7},{10,9}}.
③K(α,β)对一元函数B(Ψ)的结果,即对任给的一个
Figure PCTCN2015083760-appb-000193
按照元素
Figure PCTCN2015083760-appb-000194
在原句S中的句法顺序值的从小到大排列。则不妨设
Figure PCTCN2015083760-appb-000195
则可得到有序对
Figure PCTCN2015083760-appb-000196
Figure PCTCN2015083760-appb-000197
生成的有序对是:
The result of 3K(α,β) versus unary function B(Ψ), that is, one given
Figure PCTCN2015083760-appb-000193
By element
Figure PCTCN2015083760-appb-000194
The syntactic order values in the original sentence S are arranged from small to large. You may wish to set
Figure PCTCN2015083760-appb-000195
Orderly pair
Figure PCTCN2015083760-appb-000196
then
Figure PCTCN2015083760-appb-000197
The generated ordered pairs are:
{<2,3>,<2,5>,<2,7>,<2,9>,<2,10>,<3,5>,<3,7>,<3,9>,<3,10>,<5,7>,<5,9>,<5,10>,<7,9>,<7,10>,<9,10>}。{<2,3>,<2,5>,<2,7>,<2,9>,<2,10>,<3,5>,<3,7>,<3,9>,< 3, 10>, <5, 7>, <5, 9>, <5, 10>, <7, 9>, <7, 10>, <9, 10>}.
设集合
Figure PCTCN2015083760-appb-000198
进而建立一个连续词串公 式
Figure PCTCN2015083760-appb-000199
其中
Figure PCTCN2015083760-appb-000200
是原句S中的从
Figure PCTCN2015083760-appb-000201
Figure PCTCN2015083760-appb-000202
的一组相邻的连续词串或空词串,且
Figure PCTCN2015083760-appb-000203
Figure PCTCN2015083760-appb-000204
Figure PCTCN2015083760-appb-000205
则Ф1=2+<e+<3,Ф2=2+<3+<4+<5,Ф3=2+<3+<4+<5+<6+<7,Ф4=2+<3+<4+<5+<6+<7+<8+<9,Ф5=2+<3+<4+<5+<6+<7+<8+<9+<10,Ф6=3+<4+<5,Ф7=3+<4+<5+<6+<7,Ф8=3+<4+<5+<6+<7+<8+<9,Ф9=3+<4+<5+<6+<7+<8+<9+<10,Ф10=5+<6+<7,Ф11=5+<6+<7+<8+<9,Ф12=5+<6+<7+<8+<9+<10,Ф13=7+<8+<9,Ф14=7+<8+<9+<10,Ф15=9+<e+<10。
Set collection
Figure PCTCN2015083760-appb-000198
Then establish a continuous string formula
Figure PCTCN2015083760-appb-000199
among them
Figure PCTCN2015083760-appb-000200
Is the slave in the original sentence S
Figure PCTCN2015083760-appb-000201
To
Figure PCTCN2015083760-appb-000202
a set of adjacent consecutive or empty words, and
Figure PCTCN2015083760-appb-000203
then
Figure PCTCN2015083760-appb-000204
Figure PCTCN2015083760-appb-000205
Then Ф 1 = 2+ < e + < 3, Ф 2 = 2+ < 3 + < 4 + < 5, Ф 3 = 2+ < 3 + < 4 + < 5 + < 6 + < 7, Ф 4 = 2+ <3+<4+<5+<6+<7+<8+<9, Ф 5 =2+<3+<4+<5+<6+<7+<8+<9+<10, Ф 6 =3+<4+<5, Ф 7 =3+<4+<5+<6+<7, Ф 8 =3+<4+<5+<6+<7+<8+<9 , Ф 9 =3+<4+<5+<6+<7+<8+<9+<10, Ф 10 =5+<6+<7, Ф 11 =5+<6+<7+<8+<9, Ф 12 =5+<6+<7+<8+<9+<10, Ф 13 =7+<8+<9, Ф 14 =7+<8+<9+<10, Ф 15 =9+<e+<10.
④H(Фt)对二元函数K(α,β)生成的
Figure PCTCN2015083760-appb-000206
进行检查:如果对任给的元素γ∈Фt,且
Figure PCTCN2015083760-appb-000207
Figure PCTCN2015083760-appb-000208
都有:γ=NPI或γ=VNP或γ=NOMP或γ=CONJ或γ=e,则将Фt的标记改为
Figure PCTCN2015083760-appb-000209
称为Фt生成
Figure PCTCN2015083760-appb-000210
设集合
Figure PCTCN2015083760-appb-000211
则集合
Figure PCTCN2015083760-appb-000212
Figure PCTCN2015083760-appb-000213
4H(Ф t ) generated for the binary function K(α, β)
Figure PCTCN2015083760-appb-000206
Check: if the given element is γ∈Ф t , and
Figure PCTCN2015083760-appb-000207
And
Figure PCTCN2015083760-appb-000208
Both: γ=NPI or γ=VNP or γ=NOMP or γ=CONJ or γ=e, then change the mark of Ф t to
Figure PCTCN2015083760-appb-000209
Ф t generation
Figure PCTCN2015083760-appb-000210
Set collection
Figure PCTCN2015083760-appb-000211
Collection
Figure PCTCN2015083760-appb-000212
then
Figure PCTCN2015083760-appb-000213
⑤M(α,β)表示对于任取的一个集合
Figure PCTCN2015083760-appb-000214
如果集合
Figure PCTCN2015083760-appb-000215
存在对应的
Figure PCTCN2015083760-appb-000216
则定义一个集合族,该集合族由包含集合
Figure PCTCN2015083760-appb-000217
的全体集合构成,该集合族记为
Figure PCTCN2015083760-appb-000218
Figure PCTCN2015083760-appb-000219
Figure PCTCN2015083760-appb-000220
则M(α,β)={I1({2,3}),I2({3,5}),I3({9,10})}。
5M(α,β) represents a set for any
Figure PCTCN2015083760-appb-000214
If collection
Figure PCTCN2015083760-appb-000215
Corresponding
Figure PCTCN2015083760-appb-000216
Defining a collection family that contains collections
Figure PCTCN2015083760-appb-000217
The whole set of the composition, the collection family is recorded as
Figure PCTCN2015083760-appb-000218
then
Figure PCTCN2015083760-appb-000219
Figure PCTCN2015083760-appb-000220
Then M(α,β)={I 1 ({2,3}), I 2 ({3,5}), I 3 ({9,10})}.
⑥N(α,β)对二元函数M(α,β)的结果
Figure PCTCN2015083760-appb-000221
即对于任取集合
Figure PCTCN2015083760-appb-000222
Figure PCTCN2015083760-appb-000223
如果集合
Figure PCTCN2015083760-appb-000224
存在对应的集合族
Figure PCTCN2015083760-appb-000225
则构造一个新的集合如下
Figure PCTCN2015083760-appb-000226
则可得P[I1({2,3})]={2,3,4,5},P[I2({3,5})]={2,3,4,5},P[I3({9,10})]={9,10}。
6N(α,β) results for the binary function M(α,β)
Figure PCTCN2015083760-appb-000221
That is, for any collection
Figure PCTCN2015083760-appb-000222
Figure PCTCN2015083760-appb-000223
If collection
Figure PCTCN2015083760-appb-000224
There is a corresponding collection family
Figure PCTCN2015083760-appb-000225
Then construct a new collection as follows
Figure PCTCN2015083760-appb-000226
Then P[I 1 ({2,3})]={2,3,4,5}, P[I 2 ({3,5})]={2,3,4,5}, P [I 3 ({9,10})]={9,10}.
⑦V(β)表示对二元函数N(α,β)的结果
Figure PCTCN2015083760-appb-000227
Figure PCTCN2015083760-appb-000228
Figure PCTCN2015083760-appb-000229
对任给的元素γ,
Figure PCTCN2015083760-appb-000230
都有τ(δ)≤τ(γ)。则Pmin[I1({2,3})]=2,Pmin[I2({3,5})]=2,Pmin[I3({9,10})]=9。
7V(β) represents the result of the binary function N(α, β)
Figure PCTCN2015083760-appb-000227
take
Figure PCTCN2015083760-appb-000228
Assume
Figure PCTCN2015083760-appb-000229
For the given element γ,
Figure PCTCN2015083760-appb-000230
There are τ(δ) ≤ τ(γ). Then P min [I 1 ({2, 3})] = 2, P min [I 2 ({3, 5})] = 2, P min [I 3 ({9, 10})] = 9.
B 3.3部分{yk}的生成方法B 3.3 Part {y k } generation method
[B 3.3.1]预备工作:定义如下子集合:[B 3.3.1] Preparation: Define the following sub-collections:
1)NPIyk={NPI|NPI<rk}。1) NPI yk = {NPI|NPI< r k }.
2)VNPyk={VNP|VNP<rk}。2) VNP yk = {VNP|VNP< r k }.
3)NOMPk={NOMP|NOMP<rk}。3) NOMP k = {NOMP|NOMP< r k }.
4)4)
Figure PCTCN2015083760-appb-000231
Figure PCTCN2015083760-appb-000231
其中:
Figure PCTCN2015083760-appb-000232
among them:
Figure PCTCN2015083760-appb-000232
5)ryk={rα|α<k,α∈N}。(N是自然数集)5) ry k ={r α |α<k, α∈N}. (N is a natural number set)
6)fyk={fα|α<k,α∈N}。(N是自然数集)6) fy k ={f α |α<k, α∈N}. (N is a natural number set)
[B 3.3.2]{yk}的生成算法[B 3.3.2]{y k } generation algorithm
Figure PCTCN2015083760-appb-000233
Figure PCTCN2015083760-appb-000233
②转换成:当存在rk-1时:{yk}=NPIyk∪VNPyk∪NOMPk∪Gk∪fyk∪{e},则上式转化为:2 is converted into: when r k-1 is present: {y k }=NPI yk ∪VNP yk ∪NOMP k ∪G k ∪fy k ∪{e}, then the above equation is converted into:
Figure PCTCN2015083760-appb-000234
Figure PCTCN2015083760-appb-000234
B 3.4部分{zk}的生成方法B 3.4 part of the {z k } generation method
[B 3.4.1]预备工作:定义如下子集合:[B 3.4.1] Preparatory work: Define the following subcollections:
Figure PCTCN2015083760-appb-000235
Figure PCTCN2015083760-appb-000235
Figure PCTCN2015083760-appb-000236
Figure PCTCN2015083760-appb-000236
其中:among them:
Figure PCTCN2015083760-appb-000237
Figure PCTCN2015083760-appb-000237
5)rzk={rα|k<α,α∈N}。(N是自然数集)5) rz k ={r α |k<α, α∈N}. (N is a natural number set)
6)fzk={fα|k<α,α∈N}。(N是自然数集)6) fz k ={f α |k<α, α∈N}. (N is a natural number set)
[B 3.4.2]{zk}的生成算法[B 3.4.2]{z k } generation algorithm
Figure PCTCN2015083760-appb-000238
Figure PCTCN2015083760-appb-000238
②转换成:当存在rk+1时:{zk}=NPIzk∪VNPzk∪OBJPk∪Hk∪fzk∪{e},则上式转化为:2 is converted into: when r k+1 is present: {z k }=NPI zk ∪VNP zk ∪OBJP k ∪H k ∪fz k ∪{e}, then the above equation is converted into:
Figure PCTCN2015083760-appb-000239
Figure PCTCN2015083760-appb-000239
B 3.5部分 矩阵表达式和线性表达式B 3.5 Partial matrix expressions and linear expressions
[B 3.5.1]矩阵表达式[B 3.5.1] matrix expression
进而,语句S可以用矩阵形式表达,即:Furthermore, the statement S can be expressed in a matrix form, namely:
Figure PCTCN2015083760-appb-000240
Figure PCTCN2015083760-appb-000240
当一个函数fj充当另一个函数fk的主语元素或宾语元素时,例如:当fk=x+<y+<r+<fj或fk=x+<fj+<r+<y时,称fk是经过复合运算而得到。在本发明中复合运算记为f(f)。When a function f j acts as a subject element or an object element of another function f k , for example, when f k =x+<y+<r+<f j or f k =x+<f j +<r+<y, k is obtained by a composite operation. In the present invention, the compound operation is denoted as f(f).
由于函数f从整体上看也是词单元,所以偏加运算适用于函数。如果函数fi、fj满足fifj,且另一个函数fk可以表达为fi和fj的偏加即fk=fi+<fj,称fk经过偏加运算而得到的。Since the function f is also a word unit as a whole, the partial addition operation is applied to the function. If the functions f i , f j satisfy f i < f j and the other function f k can be expressed as the offset of f i and f j , ie f k =f i +<f j , the f k is subjected to the partial addition operation And got it.
每一个不省略谓语动词的英文语句S都可以看作是由n个函数f1,……,fn(n等于谓语动词单元数量)经过有限次的复合和偏加运算而得到的。据此,可以将任何一个不省略谓语的英文语句S记为:Each English sentence S that does not omit the predicate verb can be regarded as a result of a finite number of compounding and partial addition operations by n functions f 1 , ..., f n (n is equal to the number of predicate verb units). According to this, any English sentence S that does not omit the predicate can be recorded as:
Figure PCTCN2015083760-appb-000241
Figure PCTCN2015083760-appb-000241
也即,任何一个不省略谓语的英文语句由包括引导语元素、主语元素、谓语元素或宾语元素的向量经复合或偏加运算获得。接下来,就面临为英文自然句S选取一种合理表达式的问题。 这种表达式,必须能够恰到好处地表明S中所包含的一切复合和偏加运算。矩阵形态恰好具备这样的条件,它能将函数的复合运算用某一行向量中元素的位置来体现,例如:fk(fj)=fk(xk,fj,rk,zk),就表明fk与fj二者之间的复合运算关系;同时,又没有破坏元素之间的偏加关系:fk=xk+<fj+<rk+<zk。综上,为了准确、直观、清楚地表达英文自然句S,为了更好地揭示自然句S的内在数理结构,我们采用矩阵作为自然句S的首要表达式。That is, any English sentence that does not omit the predicate is obtained by a composite or partial addition operation of a vector including a guide element, a subject element, a predicate element, or an object element. Next, I am faced with the problem of choosing a reasonable expression for the English natural sentence S. This expression must be able to justify all the compounding and biasing operations contained in S. The matrix form happens to have the condition that the compound operation of the function can be represented by the position of the element in a row vector, for example: f k (f j )=f k (x k ,f j ,r k ,z k ) It shows the compound operation relationship between f k and f j ; at the same time, there is no destructive relationship between the elements: f k =x k +<f j +<r k +<z k . In summary, in order to accurately and intuitively express the English natural sentence S, in order to better reveal the intrinsic mathematical structure of the natural sentence S, we adopt the matrix as the primary expression of the natural sentence S.
[B 3.5.2]线性表达式[B 3.5.2] Linear expression
同时,还可以利用线性形式来表达语句S,即:At the same time, you can also use the linear form to express the statement S, namely:
Figure PCTCN2015083760-appb-000242
Figure PCTCN2015083760-appb-000242
特别强调:With particular emphasis on:
①每一个不省略谓语的英文自然句S的线性表达式都包含了有限次的偏加运算和复合运算。本文采用线性表达式作为自然句S的补充表达式。1 Each linear expression of the English natural sentence S that does not omit the predicate contains a finite number of partial addition operations and compound operations. This paper uses a linear expression as a supplementary expression of the natural sentence S.
②本发明的矩阵表达式和线性表达式之间是等价关系。2 The equivalence relation between the matrix expression and the linear expression of the present invention.
③一个英文自然句S的线性表达式,同时也天然地是一个以函数f1,……,fn(n等于谓语动词单元数量)为未知量的线性方程组,因此,本文接下来的用代入法求得句法结构解析结果的过程,也自然地可以看作是求解这个以函数f1,……,fn(n等于谓语动词单元数量)为未知量的线性方程组的过程。3 A linear expression of an English natural sentence S, which is also naturally a linear equation with the function f 1 , ..., f n (n is equal to the number of predicate verb units) as an unknown quantity, therefore, the next use of this paper The process of obtaining the parsing result of the syntactic structure by the substitution method can also naturally be regarded as the process of solving the linear equations with the functions f 1 , . . . , f n (n is equal to the number of predicate verb units) as an unknown quantity.
B 3.6部分 矩阵的代入求解程序B 3.6 Partial matrix substitution solver
第1步 step 1
如果存在不在该句法结构可能矩阵解中出现的顺序值,则排除该句法结构可能矩阵解;例如对于如下的可能矩阵解If there are sequential values that do not appear in the possible matrix solution of the syntactic structure, then the syntactic structure may be excluded from the solution; for example, for the possible matrix solutions below
Figure PCTCN2015083760-appb-000243
Figure PCTCN2015083760-appb-000243
编号为4的词单元没有出现,排除。The word unit numbered 4 does not appear and is excluded.
第2步Step 2
如果在不同的句法向量中出现相同的顺序值或出现相同的句法向量,则排除该句法结构可能矩阵解;If the same order value appears in a different syntax vector or the same syntax vector appears, the possible syntax solution is excluded from the syntax structure;
例如对于如下的可能矩阵解For example, the following possible matrix solution
Figure PCTCN2015083760-appb-000244
Figure PCTCN2015083760-appb-000244
编号为5的词单元出现了两次,排除。The word unit numbered 5 appears twice and is excluded.
第3步Step 3
在每一个可能矩阵解中,将找得到明确位置的句法向量全都进行等量代换,如果在等量代换之后出现两个句法向量的交叉矛盾,则排除该句法结构可能矩阵解;In each possible matrix solution, the syntactic vectors that find the clear position are all equally substituted. If the cross-contradictions of the two syntactic vectors appear after the equal-substitution, the possible matrix solution may be excluded.
例如对于如下的可能矩阵解For example, the following possible matrix solution
Figure PCTCN2015083760-appb-000245
Figure PCTCN2015083760-appb-000245
对上述矩阵进行代入,f2和f3出现了函数的代入交叉矛盾。代入得到:f2=3+<e+<6+<(4+<f2+<7+<e)。等式左右两端同时出现了f2,这就出现了(的)逻辑矛盾。排除。Substituting the above matrix, f 2 and f 3 appear to cross-contradict the function. Substituting: f 2 = 3 + < e + < 6 + < (4 + < f 2 + < 7 + < e). The f 2 appears at both ends of the equation, and there is a logical contradiction. exclude.
第4步Step 4
在每一个可能矩阵解中,将找得到明确位置的句法向量全都进行等量代换,如果在等量代换之后出现两个位置逆反的顺序值,则排除该句法结构可能矩阵解;这是既数学处理的根本要求,也是定义在严格偏序关系<上的偏加法运算的本质要求。In each possible matrix solution, the syntactic vectors that find the clear position are all equally substituted. If the order values of the two positions are reversed after the equal substitution, the possible matrix solution may be excluded; this is The fundamental requirement of mathematical processing is also the essential requirement of partial addition operation defined on strict partial order relationship < .
例如对于如下的可能矩阵解For example, the following possible matrix solution
Figure PCTCN2015083760-appb-000246
Figure PCTCN2015083760-appb-000246
对其进行代入,f2=4+<5+<6+<3+<e+<7+<e,得到顺序为(4,5,6,3,e,7,e),出现位置逆反的顺序值,排除。Substituting it, f 2 =4+<5+<6+<3+<e+<7+<e, the order is (4,5,6,3,e,7,e), and the position is reversed. Order value, exclude.
第5步Step 5
在任意一个可能矩阵解中,如果存在与其他句法向量没有相互代入关系的句法向量,则执行插空操作以获得所有该可能矩阵解所对应的可能句法解析结构,并验证根据所述可能句法解析结构得到的语句是否与所述经预处理的语句完全相同,其进一步包括:In any possible matrix solution, if there is a syntax vector that does not have a substitution relationship with other syntax vectors, an insertion operation is performed to obtain a possible syntactic parsing structure corresponding to all the possible matrix solutions, and the parsing according to the possible syntax is verified. Whether the statement obtained by the structure is identical to the preprocessed statement, further comprising:
5.5.1、先对该可能矩阵解中相互之间存在代入关系的句法向量进行等量代换,从而将该可能矩阵解转化为一组相互之间不存在代入关系的句法向量
Figure PCTCN2015083760-appb-000247
将可能矩阵解中的句法向量称为第一类句法向量,将转化得到的句法向量
Figure PCTCN2015083760-appb-000248
称为第二类句法向量;
5.5.1. Firstly, the syntactic vectors with the substitution relationship between the possible matrix solutions are equally substituted, so that the possible matrix solutions are transformed into a set of syntactic vectors with no substitution relationship between each other.
Figure PCTCN2015083760-appb-000247
The syntactic vector in the possible matrix solution is called the first kind of syntactic vector, and the transformed syntactic vector will be transformed.
Figure PCTCN2015083760-appb-000248
Called the second type of syntax vector;
5.5.2、任取一个第二类句法向量
Figure PCTCN2015083760-appb-000249
按照预定的方向逐一标注
Figure PCTCN2015083760-appb-000250
中的每一个句法元素的顺序值;标注句法元素的顺序值之后,任取
Figure PCTCN2015083760-appb-000251
中的第i个句法元素,仅在该句法元素的第一侧构造唯一的空位;造空之后,任取一个句法向量
Figure PCTCN2015083760-appb-000252
以外的第二类句法向量
Figure PCTCN2015083760-appb-000253
以整体插空的方式将句法向量
Figure PCTCN2015083760-appb-000254
插入所构造的空位,进而生成一个新的句法向量,将这个新句法向量记为
Figure PCTCN2015083760-appb-000255
并将整体插空而得到的句法向量,统称为第三类句法向量;
5.5.2, take a second type of syntax vector
Figure PCTCN2015083760-appb-000249
Mark one by one according to the predetermined direction
Figure PCTCN2015083760-appb-000250
The order value of each syntax element in the message; after appending the order value of the syntax element, take any
Figure PCTCN2015083760-appb-000251
The i-th syntax element in the construct, only a unique gap is constructed on the first side of the syntax element; after the void, take a syntax vector
Figure PCTCN2015083760-appb-000252
Second type of syntax vector
Figure PCTCN2015083760-appb-000253
Syntactic vector in the form of overall insertion
Figure PCTCN2015083760-appb-000254
Insert the constructed vacancy, and then generate a new syntax vector, record this new syntax vector as
Figure PCTCN2015083760-appb-000255
The syntactic vectors obtained by inserting the whole into space are collectively referred to as the third type of syntax vector;
5.5.3、对第三类句法向量
Figure PCTCN2015083760-appb-000256
按照预定的方向对从向量
Figure PCTCN2015083760-appb-000257
中的第一侧第一个句法元素开始到向量
Figure PCTCN2015083760-appb-000258
中包含的向量
Figure PCTCN2015083760-appb-000259
的第二侧第一个句法元素为止的每一个句法元素,全都标注顺序值;位于向量
Figure PCTCN2015083760-appb-000260
中包含的向量
Figure PCTCN2015083760-appb-000261
第一侧的元素,不标注顺序值;将向量
Figure PCTCN2015083760-appb-000262
的第二侧的第一个句法元素记为
Figure PCTCN2015083760-appb-000263
将按照前述方式对向量
Figure PCTCN2015083760-appb-000264
标注的句法向量部分,记为甩尾句法向量
Figure PCTCN2015083760-appb-000265
标注顺序值之后,任取一个前述的甩尾向量中的第j个句法元素,仅在该元素第一侧构造唯一的空位;造空之后,任取一个未使用过的第二类句法向量
Figure PCTCN2015083760-appb-000266
以整体插空的方式将句法向量
Figure PCTCN2015083760-appb-000267
插入所构造的空位,进而生成一个新的句法向量,则将新生成的句法向量记为
Figure PCTCN2015083760-appb-000268
或者
5.5.3, the third type of syntax vector
Figure PCTCN2015083760-appb-000256
Pair vector from the predetermined direction
Figure PCTCN2015083760-appb-000257
The first syntactic element on the first side starts into the vector
Figure PCTCN2015083760-appb-000258
Vector contained in
Figure PCTCN2015083760-appb-000259
Each of the syntax elements up to the first syntactic element on the second side, all of which are labeled with a sequence value;
Figure PCTCN2015083760-appb-000260
Vector contained in
Figure PCTCN2015083760-appb-000261
The element on the first side, without the order value; the vector
Figure PCTCN2015083760-appb-000262
The first syntax element on the second side is marked as
Figure PCTCN2015083760-appb-000263
Will be vectored as described above
Figure PCTCN2015083760-appb-000264
The syntactic vector part of the annotation, denoted as the iris syntax vector
Figure PCTCN2015083760-appb-000265
After the order value is marked, take the jth syntax element in the aforementioned tail vector, and construct a unique gap only on the first side of the element; after the empty, take an unused second type of syntax vector
Figure PCTCN2015083760-appb-000266
Syntactic vector in the form of overall insertion
Figure PCTCN2015083760-appb-000267
Insert the constructed vacancy, and then generate a new syntax vector, then record the newly generated syntax vector as
Figure PCTCN2015083760-appb-000268
or
第三类句法向量
Figure PCTCN2015083760-appb-000269
按照预定方向,对句法向量
Figure PCTCN2015083760-appb-000270
中的每一个句法元素全都标注顺序值;标注句法元素的顺序值之后,任取一个
Figure PCTCN2015083760-appb-000271
中的第t个句法元素,在该句法元素的一侧构造唯一的空位;造空之后,任取一个未使用过得第二类句法向量
Figure PCTCN2015083760-appb-000272
以整体插空的方式将该向量
Figure PCTCN2015083760-appb-000273
插入前面构造的空位,进而生成一个新向量,则该新向量记为
Figure PCTCN2015083760-appb-000274
Third type of syntax vector
Figure PCTCN2015083760-appb-000269
Syntactic vector according to the predetermined direction
Figure PCTCN2015083760-appb-000270
Each syntax element in the label is labeled with a sequential value; after the order value of the syntax element is annotated, take one
Figure PCTCN2015083760-appb-000271
The t-th syntax element in the construct, constructing a unique vacancy on one side of the syntactic element; after creating an empty space, taking an unused second-class syntactic vector
Figure PCTCN2015083760-appb-000272
The vector is inserted as a whole
Figure PCTCN2015083760-appb-000273
Insert the previously constructed gap and generate a new vector, then the new vector is recorded as
Figure PCTCN2015083760-appb-000274
5.5.4、重复执行5.5.3,每当上一次造空和插空步骤结束的时候,对经过上一次造空和插空步骤而得到的第三类句法向量进行下一次的造空和插空操作,直至将所有第二类句法向量
Figure PCTCN2015083760-appb-000275
全部插空完毕,最后得到一个单行的第三类句法向量,将所述最后得到的第三类句法向量称为最终单行向量;
5.5.4. Repeat 5.5.3. When the last emptying and emptying steps are completed, the third type of syntactic vector obtained through the previous emptying and emptying steps is subjected to the next emptying and insertion. Null operation until all second type of syntax vectors will be
Figure PCTCN2015083760-appb-000275
After all the insertions are completed, a third type of syntax vector of a single line is finally obtained, and the finally obtained third type of syntax vector is called a final single line vector;
5.5.5、如果一可能句法解析结构对应的所有所述最终单行向量中均存在两个位置逆反的顺序值,则排除该可能句法解析结构;5.5.5. If there are two position reversed order values in all of the final single row vectors corresponding to a possible syntax parsing structure, the possible syntactic parsing structure is excluded;
5.5.6、重复执行5.5.2至5.5.5直至所有可能句法解析结构被遍历。5.5.6. Repeat 5.5.2 to 5.5.5 until all possible syntactic parsing structures are traversed.
B 3.7部分 矩阵修正程序 Part B 3.7 Matrix Correction Procedure
必要时,转入修正程序,以便对两个以上的句法结构解析结果进行修正,具体包括如下操作:If necessary, transfer to the correction program to correct the results of more than two syntactic structure analysis, including the following operations:
(1)名词代词单元充当主语和宾语的重检和取舍。(1) The noun pronoun unit acts as a re-examination and trade-off between the subject and the object.
(2)运用语言规则对句法结构进行检查。举例:(2) Use the linguistic rules to check the syntactic structure. Example:
①依据英文句法结构规律,主语从句的引导词不能省略。1 According to the rules of English syntactic structure, the guiding words of subject clauses cannot be omitted.
引导主语从句的that不能省略;That that guides the subject clause cannot be omitted;
②依据英文句法结构规律,主语在人称和数量上要与谓语保持一致;2 According to the rules of English syntactic structure, the subject should be consistent with the predicate in terms of person and quantity;
③依据动词的及物和不及物性质,判定其后是否连接宾语。3 According to the nature of the verbs and the intransitive nature, determine whether the object is connected later.
(3)结构歧义的重新检查检和排除。(3) Re-examination and elimination of structural ambiguity.
(4)倒装、省略、there be作为特殊情形对待。(4) Flip, omit, and there be treated as special circumstances.
(5)将抽出的成分放回。(5) Put the extracted ingredients back.
(6)生成并输出最终解。(6) Generate and output the final solution.
通过修正可以克服部分语句结构不规范的问题提高解析准确度。The correction can overcome the problem of irregular structure of some statements and improve the accuracy of analysis.
优选地,可以根据解析结果将句法结构形成句法树数据结构。Preferably, the syntax structure can be formed into a syntax tree data structure according to the analysis result.
B 3.8部分 对两种插空方法的专门说明Part 3.8 Particular description of the two methods of insertion
[B 3.8.1]两种不同的插空方法的共同原则:[B 3.8.1] Common principles for two different insertion methods:
原句的顺序值数列1,2,……,k,可以看作是经过可能矩阵解中的找得到明确位置的句法向量的等量代换和经过可能矩阵解中的找不到明确位置的句法向量之间的有限次整体插空而得到的。即,与原句对应的初始句法向量
Figure PCTCN2015083760-appb-000276
可以看作是先经过可能矩阵解中的找得到明确位置的句法向量的等量代换,再经过可能矩阵解中的找不到明确位置的句法向量之间的有限次整体插空而得到的。各种不相同的插空情况,本质上就是组合数学中的排列和组合。
The order value sequence of the original sentence, 1, 2, ..., k, can be regarded as an equivalent substitution of the syntactic vector in which the explicit position is found in the possible matrix solution and the unidentified position in the possible matrix solution A finite number of global interpolations between syntax vectors. That is, the initial syntax vector corresponding to the original sentence
Figure PCTCN2015083760-appb-000276
It can be seen as an equal-substitution of the syntactic vector in which the explicit position is found in the possible matrix solution, and then through the finite sub-interpolation between the syntactic vectors in the possible matrix solution where the clear position cannot be found. . A variety of different insertions are essentially the permutations and combinations in combinatorial mathematics.
[B 3.8.2]第1种插空方法:[B 3.8.2] The first type of insertion method:
在任意一个可能矩阵解中,如果存在与其他任何句法向量之间都没有明确的代入关系的句法向量,则首先对该可能矩阵解中与其他句法向量之间存在代入关系的句法向量全都进行等量代换,同时令该可能矩阵解中与其他句法向量之间不存在代入关系的句法向量全都保持不变,综合前述两个方面,将该可能矩阵解转化为一组相互之间全都不存在代入关系的句法向量
Figure PCTCN2015083760-appb-000277
将可能矩阵解中原有的句法向量f1,f2,...,fδ统称为第一类句法向量;在前述的等量代换之后,将按照前述方式转化出来的一组相互之间不存在代入关系的句法向量
Figure PCTCN2015083760-appb-000278
统称为第二类句法向量;强调,第二类句法向量全都是相互之间不存在代入关系的句法向量。当θ≥2时,整体插空有意义;下述讨论全都预设θ≥2。
In any possible matrix solution, if there is a syntactic vector with no explicit substitution relationship with any other syntax vector, firstly, the syntactic vectors with substitutional relations with other syntax vectors in the possible matrix solution are all equal. The quantity substitution is performed, and at the same time, the syntactic vectors in the possible matrix solution and other syntactic vectors do not have an substitution relationship, and the above two aspects are integrated, and the possible matrix solutions are transformed into a group which does not exist with each other. Syntactic vector of substitution relationship
Figure PCTCN2015083760-appb-000277
The original syntactic vectors f 1 , f 2 , . . . , f δ in the possible matrix solution are collectively referred to as the first type of syntax vector; after the aforementioned equal-substitution, the groups transformed in the foregoing manner are mutually There is no syntactic vector for the substitution relationship
Figure PCTCN2015083760-appb-000278
They are collectively referred to as the second type of syntactic vectors; it is emphasized that the second type of syntactic vectors are all syntactic vectors that do not have substitutional relationships with each other. When θ ≥ 2, the overall insertion is meaningful; the following discussion all preset θ ≥ 2.
接下来,进行单侧同向保序整体插空,也可以称为单侧顺向保序整体插空:任取一个第二类句法向量
Figure PCTCN2015083760-appb-000279
按照从右到左的方向(也可以从左到右)逐一标注句法向量
Figure PCTCN2015083760-appb-000280
中的每一个句法元素的顺序值。标注句法元素的顺序值之后,任取一个
Figure PCTCN2015083760-appb-000281
中的句法元素,不妨设该句法元素是
Figure PCTCN2015083760-appb-000282
中的右边数第i个元素,仅在该句法元素的右侧(也可以仅在左侧)构造唯一的空位;造空之后,任取一个除句法向量
Figure PCTCN2015083760-appb-000283
以外的第二类句法向量
Figure PCTCN2015083760-appb-000284
以整体插空的方式将句法向量
Figure PCTCN2015083760-appb-000285
插入前面构造的空位,进而生成一个新的句法向量,将这个新向量记为
Figure PCTCN2015083760-appb-000286
凡是经过整体插空而得到的句法向量,统称为第三类句法向量,则
Figure PCTCN2015083760-appb-000287
是第三类句法向量;对于任给的两个句法向量α和β,如果将向量β以整体插空的方式插入句法向量α的右边数第i个句法元素所对应的空位,得到了一个新的第三类句法向量,则将这个新获得的第三类句法向量记为[α]i+<β;强调,第三类句法向量全都是相互之间不存在代入关系的句法向量。第1个造空和插空步骤完毕。
Next, the single-side directional ordering overall insertion is performed, which can also be called unilateral forward ordering overall insertion: any second type of syntax vector
Figure PCTCN2015083760-appb-000279
Label the syntax vectors one by one from right to left (and from left to right)
Figure PCTCN2015083760-appb-000280
The order value of each syntax element in . After labeling the order value of the syntax element, take one
Figure PCTCN2015083760-appb-000281
Syntactic element in the middle, may wish to set the syntax element is
Figure PCTCN2015083760-appb-000282
The i-th element on the right side of the vector, only the left side of the syntax element (or only on the left side) constructs a unique gap; after the empty, take a division vector
Figure PCTCN2015083760-appb-000283
Second type of syntax vector
Figure PCTCN2015083760-appb-000284
Syntactic vector in the form of overall insertion
Figure PCTCN2015083760-appb-000285
Insert the previously constructed space, and then generate a new syntax vector, record this new vector as
Figure PCTCN2015083760-appb-000286
The syntactic vectors obtained by the overall insertion are collectively referred to as the third type of syntax vector.
Figure PCTCN2015083760-appb-000287
Is the third type of syntactic vector; for the two syntactic vectors α and β, if the vector β is inserted into the space corresponding to the ith slot of the i-th syntax element of the right side of the syntactic vector α, a new one is obtained. The third type of syntactic vector, the newly obtained third type of syntactic vector is recorded as [α] i + <β; emphasize that the third type of syntactic vectors are all syntactic vectors that do not have substitution relations with each other. The first emptying and emptying steps are completed.
转入第2个造空和插空步骤。对经过第1个造空和插空步骤而得到的第三类句法向量
Figure PCTCN2015083760-appb-000288
按照从右到左的方向(也可从左到右,但要与上一次标序的选取方向相同,即与上一次标序在同一侧),对从向量
Figure PCTCN2015083760-appb-000289
中的右边数第一个句法元素开始到向量
Figure PCTCN2015083760-appb-000290
中包含的向量
Figure PCTCN2015083760-appb-000291
的左边数第一个句法元素为止的每一个句法元素,全都标注顺序值;位于向量
Figure PCTCN2015083760-appb-000292
中包含的向量
Figure PCTCN2015083760-appb-000293
左侧的句法元素,不标注顺序值;将向量
Figure PCTCN2015083760-appb-000294
的左边数第一个句法元素记为
Figure PCTCN2015083760-appb-000295
将按照前述方式对向量
Figure PCTCN2015083760-appb-000296
标注的句法向量部分,记为:句法向量
Figure PCTCN2015083760-appb-000297
称该句法向量为:甩尾向量。标注顺序值之后,任取一个前述的甩尾向量中的句法元素,不妨设该元素是甩尾向量
Figure PCTCN2015083760-appb-000298
中的右边数第j个元素,仅在该元素右侧(也可仅在左侧,但要与上一次造空的选取方向相同,即与上一次造空在同一侧),构造唯一的空位;造空之后,任取一个除在第1个造空和插空步骤中使用过的句法向量
Figure PCTCN2015083760-appb-000299
Figure PCTCN2015083760-appb-000300
以外的第二类句法向量
Figure PCTCN2015083760-appb-000301
以整体插空的方式将句法向量
Figure PCTCN2015083760-appb-000302
插入前面构造的空位,进而生成一个新的句法向量,则将新生成的句法向量记为
Figure PCTCN2015083760-appb-000303
对于任给的句法向量α和β,将向量β的左边数第一个句法元素记为λ(β),如果存在句法向量[α]i+<β,则按照前述方式对向量[α]i+<β进行标注,并将按照前述方式标注的句法向量部分,记为:向量[αk\λ(βk-1)],称该向量为:甩尾向量。第2个造空和插空步骤完毕。
Transfer to the second emptying and inserting steps. The third type of syntactic vector obtained after the first emptying and emptying steps
Figure PCTCN2015083760-appb-000288
In the right-to-left direction (also from left to right, but in the same direction as the last order, ie on the same side as the previous order), the slave vector
Figure PCTCN2015083760-appb-000289
The first syntactic element in the right side of the number begins in the vector
Figure PCTCN2015083760-appb-000290
Vector contained in
Figure PCTCN2015083760-appb-000291
Each of the syntax elements up to the first syntax element on the left side is all labeled with a sequence value;
Figure PCTCN2015083760-appb-000292
Vector contained in
Figure PCTCN2015083760-appb-000293
The syntax element on the left, without the order value; the vector
Figure PCTCN2015083760-appb-000294
The first syntax element on the left is recorded as
Figure PCTCN2015083760-appb-000295
Will be vectored as described above
Figure PCTCN2015083760-appb-000296
The part of the syntax vector of the label, denoted as: syntax vector
Figure PCTCN2015083760-appb-000297
The syntactic vector is called: the tail vector. After labeling the order value, take any of the syntactic elements in the aforementioned tail vector, and let the element be the tail vector
Figure PCTCN2015083760-appb-000298
The jth element on the right side of the element, only on the right side of the element (can also only be on the left side, but the same direction as the previous emptying, that is, on the same side as the last emptying), constructing a unique vacancy After the air is created, take a syntax vector other than that used in the first emptying and inserting steps.
Figure PCTCN2015083760-appb-000299
with
Figure PCTCN2015083760-appb-000300
Second type of syntax vector
Figure PCTCN2015083760-appb-000301
Syntactic vector in the form of overall insertion
Figure PCTCN2015083760-appb-000302
Insert the previously constructed gaps to generate a new syntax vector, then record the newly generated syntax vector as
Figure PCTCN2015083760-appb-000303
For any given [alpha] vector syntax and β, the number of beta] vector left in the first syntax element referred to as λ (β), if present, the vector syntax [α] i + <β, the vector according to the embodiment of [α] i +<β is annotated, and the part of the syntax vector marked as described above is denoted as: vector [α k \λ(β k-1 )], which is called: the tail vector. The second emptying and emptying steps are completed.
按照前述方法,对经过上一次造空和插空步骤而得到的第三类句法向量选取甩尾向量,并按照前述的方法对所选取的甩尾向量标注顺序值,但要与上一次标序的选取方向相同,即与上一次标序在同一侧;标注顺序值之后,任取一个该甩尾向量中的句法元素,按照前述方法构造唯一的单侧空位,但要与上一次造空的选取方向相同,即与上一次造空在同一侧;造空之后,任取一个除在先前的造空和插空步骤中使用过的句法向量以外的第二类句法向量,以整体插空的方式将该第二类句法向量插入前面构造的空位,进而生成一个新的句法向量;重复执行前述 的操作:每当上一次造空和插空步骤结束的时候,都按照前述的方法,对经过上一次造空和插空步骤而得到的第三类句法向量进行下一次的造空和插空操作,直至将第二类句法向量
Figure PCTCN2015083760-appb-000304
全部插空完毕,最后得到一个单行的第三类句法向量;将最后得到的第三类句法向量称为最终单行向量。
According to the foregoing method, the tail vector is selected for the third type of syntactic vector obtained through the previous emptying and emptying steps, and the selected tail vector is labeled with the order value according to the foregoing method, but with the last labeling The selection direction is the same, that is, on the same side as the previous sequence; after the order value is assigned, take a syntax element in the tail vector, and construct a unique one-side vacancy according to the foregoing method, but with the last emptying The selection direction is the same, that is, on the same side as the previous emptying; after the emptying, take a second type of syntax vector other than the syntactic vector used in the previous emptying and insertion steps, to insert the entire empty space. The second type of syntax vector is inserted into the previously constructed gap, and a new syntax vector is generated; the foregoing operation is repeated: whenever the last emptying and emptying steps are completed, the method is followed according to the foregoing method. The third type of syntax vector obtained from the last emptying and insertion steps performs the next emptying and insertion operations until the second type of syntax vector
Figure PCTCN2015083760-appb-000304
After all the insertions are completed, the third type of syntactic vector of a single line is finally obtained; the third type of syntactic vector obtained last is called the final single line vector.
将前述的从首次选取第二类句法向量
Figure PCTCN2015083760-appb-000305
到生成最终单行向量的一个完整流程作为一个具体方案,从而前述的每一次造空和插空步骤也就是具体方案中的一个步骤。
The first choice of the second type of syntax vector from the first
Figure PCTCN2015083760-appb-000305
A complete process to generate the final single-line vector is taken as a specific solution, so that each of the aforementioned emptying and emptying steps is also a step in the specific solution.
通过穷举每一个步骤的全部可能情况,穷举全部方案。检查穷举生成的每一个最终单行向量:删除出现两个位置逆反的顺序值的最终单行向量。Exhaustive all programs by exhausting all possible situations for each step. Examine each of the final single-line vectors generated by the exhaustive: delete the final single-line vector in which the order values of the two positions are reversed.
如果穷举生成的每一个最终单行向量都出现两个位置逆反的顺序值,则违反自然规律,排除全部最终单行向量,进而排除该句法结构可能矩阵解。If each of the final single-row vectors generated by the exhaustively appears in the order of the two positions, then the natural law is violated, and all the final single-row vectors are excluded, thereby eliminating the possible matrix solution of the syntax structure.
凡是没有出现两个位置逆反的顺序值的最终单行向量,都符合自然规律,都是合理的最终单行向量;保留合理的最终单行向量作为正确结果之一,并保留该句法结构可能矩阵解作为正确结果之一,以备生成句法树之用。The final single-row vector that does not appear in the order of the two positions is in line with the natural law, and is a reasonable final single-line vector; retain the reasonable final single-line vector as one of the correct results, and retain the syntactic structure. One of the results, in order to generate a syntax tree.
[B 3.8.3]第2种插空方法:[B 3.8.3] The second type of insertion method:
在任意一个可能矩阵解中,如果存在与其他任何句法向量之间都没有明确的代入关系的句法向量,则首先对该可能矩阵解中与其他句法向量之间存在代入关系的句法向量全都进行等量代换,同时令该可能矩阵解中与其他句法向量之间不存在代入关系的句法向量都保持不变,综合前述两个方面,将该可能矩阵解转化为一组相互之间全都不存在代入关系的句法向量
Figure PCTCN2015083760-appb-000306
将该可能矩阵解中原有的那些句法向量f1,f2,...,fδ统称为第一类句法向量;在前述的等量代换之后,将按照前述方式转化出来的一组相互之间不存在代入关系的句法向量
Figure PCTCN2015083760-appb-000307
统 称为第二类句法向量;强调,第二类句法向量全都是相互之间不存在代入关系的句法向量。当θ≥2时,整体插空有意义;下述讨论全都预设θ≥2。
In any possible matrix solution, if there is a syntactic vector with no explicit substitution relationship with any other syntax vector, firstly, the syntactic vectors with substitutional relations with other syntax vectors in the possible matrix solution are all equal. Substituting the quantity, and at the same time keeping the syntactic vector of the possible matrix solution and other syntactic vectors without the substitution relationship remain unchanged. Combining the above two aspects, the possible matrix solution is transformed into a group that does not exist with each other. Syntactic vector of substitution relationship
Figure PCTCN2015083760-appb-000306
The syntactic vectors f 1 , f 2 , . . . , f δ which are the original ones of the possible matrix solutions are collectively referred to as the first type of syntactic vectors; after the aforementioned equivalent substitution, a set of mutual transformations will be performed in the foregoing manner. There is no syntactic vector between substitutions
Figure PCTCN2015083760-appb-000307
It is collectively referred to as the second type of syntactic vector; it is emphasized that the second type of syntactic vectors are all syntactic vectors that do not have substitutional relationships with each other. When θ ≥ 2, the overall insertion is meaningful; the following discussion all preset θ ≥ 2.
接下来,进行单侧同向不保序整体插空,也可以称为单侧顺向不保序整体插空:任取一个第二类句法向量
Figure PCTCN2015083760-appb-000308
按照从右到左的方向(也可从左到右)对句法向量
Figure PCTCN2015083760-appb-000309
中的每一个句法元素逐一标注顺序值。标注句法元素的顺序值之后,任取一个
Figure PCTCN2015083760-appb-000310
中的句法元素,不妨设该元素是
Figure PCTCN2015083760-appb-000311
中的右边数第m个元素,仅在该句法元素的右侧(也可以仅在左侧)构造唯一的空位;造空之后,任取一个除句法向量
Figure PCTCN2015083760-appb-000312
以外的第二类句法向量
Figure PCTCN2015083760-appb-000313
以整体插空的方式将向量
Figure PCTCN2015083760-appb-000314
插入前面构造的空位,进而生成一个新的句法向量,则将新生成的句法向量记为
Figure PCTCN2015083760-appb-000315
凡是经过整体插空而得到的句法向量,统称为第三类句法向量;对于任给的两个句法向量α和β,如果将向量β以整体插空的方式插入向量α的右边数第m个句法元素所对应的空位,得到了一个新的第三类句法向量,则将这个新获得的第三类句法向量记为(α)m+<β。第1个造空和插空步骤完毕。
Next, the single-side undirected unordered overall insertion is also referred to as a one-sided forward unpreserved overall insertion: any second type of syntax vector
Figure PCTCN2015083760-appb-000308
Syntactic vector in right-to-left direction (also from left to right)
Figure PCTCN2015083760-appb-000309
Each syntax element in the dimension is labeled one by one. After labeling the order value of the syntax element, take one
Figure PCTCN2015083760-appb-000310
In the syntax element, you may wish to set the element to be
Figure PCTCN2015083760-appb-000311
The mth element on the right side of the vector, only the left side of the syntax element (or only on the left side) constructs a unique gap; after the empty, take a division vector
Figure PCTCN2015083760-appb-000312
Second type of syntax vector
Figure PCTCN2015083760-appb-000313
Put the vector as a whole
Figure PCTCN2015083760-appb-000314
Insert the previously constructed gaps to generate a new syntax vector, then record the newly generated syntax vector as
Figure PCTCN2015083760-appb-000315
The syntactic vectors obtained by the overall insertion are collectively referred to as the third type of syntactic vectors; for the two syntactic vectors α and β, if the vector β is inserted into the space, the mth of the right side of the vector α is inserted. The vacancy corresponding to the syntax element, and a new third type of syntax vector is obtained, and the newly obtained third type of syntax vector is recorded as (α) m + < β. The first emptying and emptying steps are completed.
转入第2个造空和插空步骤。对于经过第1个造空和插空步骤而得到的第三类句法向量
Figure PCTCN2015083760-appb-000316
按照从右到左的方向(也可以是从左到右,但要与上一次标序的选取方向相同,即与上一次标序在同一侧),对句法向量
Figure PCTCN2015083760-appb-000317
中的每一个句法元素全都标注顺序值;标注句法元素的顺序值之后,任取一个
Figure PCTCN2015083760-appb-000318
中的句法元素,不妨设该元素是
Figure PCTCN2015083760-appb-000319
中的右边数第t个句法元素,仅在该句法元素右侧(也可仅在左侧,但要与上一次造空的选取方向相同,即与上一次造空在同一侧),构造唯一的空位;造空之后,任取一个除在第1个造空和插空步骤中使用过的句法向量
Figure PCTCN2015083760-appb-000320
Figure PCTCN2015083760-appb-000321
以外的第二类句法向量
Figure PCTCN2015083760-appb-000322
以整体插空的方式将该向量
Figure PCTCN2015083760-appb-000323
插入前面构造的空位,进而生成一个新向量,则该新向量记为
Figure PCTCN2015083760-appb-000324
Figure PCTCN2015083760-appb-000325
第2个造空和插空步骤完毕。
Transfer to the second emptying and inserting steps. The third type of syntactic vector obtained after the first emptying and emptying steps
Figure PCTCN2015083760-appb-000316
According to the direction from right to left (can also be from left to right, but the same direction as the previous order, that is, on the same side as the previous order), the syntactic vector
Figure PCTCN2015083760-appb-000317
Each syntax element in the label is labeled with a sequential value; after the order value of the syntax element is annotated, take one
Figure PCTCN2015083760-appb-000318
In the syntax element, you may wish to set the element to be
Figure PCTCN2015083760-appb-000319
The t-th syntax element on the right side of the text, only on the right side of the syntax element (can also be on the left side only, but the same direction as the previous emptying, ie on the same side as the last empty), the unique construction Vacancies; after the void, take a syntax vector other than the one used in the first emptying and inserting steps
Figure PCTCN2015083760-appb-000320
with
Figure PCTCN2015083760-appb-000321
Second type of syntax vector
Figure PCTCN2015083760-appb-000322
The vector is inserted as a whole
Figure PCTCN2015083760-appb-000323
Insert the previously constructed gap and generate a new vector, then the new vector is recorded as
Figure PCTCN2015083760-appb-000324
Figure PCTCN2015083760-appb-000325
The second emptying and emptying steps are completed.
按照前述的方法,对经过上一次造空和插空步骤而得到的第三类句法向量标注顺序值,但要与上一次标序的选取方向相同,即与上一次标序在同一侧;标注顺序值之后,任取一个该第三类句法向量中的句法元素,按照前述的方法构造唯一的单侧空位,但要与上一次造空的选取方向相同,即与上一次造空在同一侧;造空之后,任取一个除在先前的造空和插空步骤中使用过的句法向量以外的第二类句法向量,以整体插空的方式将该第二类句法向量插入前面构造的空位,进而生成一个新的句法向量;重复执行前述的操作:每当上一次造空和插空步骤结束的时候,都按照前述的方法,对经过上一次造空和插空步骤而获得的第三类句法向量进行下一次的造空和插空操作,直至将第二类句法向量
Figure PCTCN2015083760-appb-000326
全部插空完毕,最后得到一个单行的第三类句法向量;将最后得到的第三类句法向量称为最终单行向量。
According to the foregoing method, the third type of syntactic vector obtained through the previous emptying and emptying steps is labeled with a sequential value, but the same as the previous sampling order, that is, on the same side as the previous standard; After the sequence value, take a syntactic element in the third type of syntax vector, construct a unique one-sided vacancy according to the above method, but the same direction as the previous emptying, that is, on the same side as the last emptying After the emptying, take a second type of syntactic vector other than the syntactic vector used in the previous emptying and insertion steps, and insert the second type of syntactic vector into the previously constructed vacancy in a global emptying manner. , and then generate a new syntax vector; repeat the foregoing operation: every time the last emptying and emptying step ends, the third obtained after the last emptying and emptying steps are performed according to the foregoing method. The class syntax vector performs the next emptying and inserting operations until the second type of syntax vector
Figure PCTCN2015083760-appb-000326
After all the insertions are completed, the third type of syntactic vector of a single line is finally obtained; the third type of syntactic vector obtained last is called the final single line vector.
将前述的从首次选取第二类句法向量
Figure PCTCN2015083760-appb-000327
到生成最终单行向量的一个完整流程作为一个具体方案,从而前述的每一次造空和插空步骤也就是具体方案中的一个步骤。
The first choice of the second type of syntax vector from the first
Figure PCTCN2015083760-appb-000327
A complete process to generate the final single-line vector is taken as a specific solution, so that each of the aforementioned emptying and emptying steps is also a step in the specific solution.
通过穷举每一个步骤的全部可能情况,穷举全部方案。检查穷举生成的每一个最终单行向量:在区分可能矩阵解中的不同位置上的e的前提之下,将两个或两个以上完全相同的最终单行向量保留一个,删除多余的雷同的最终单行向量,然后再删除出现两个位置逆反的顺序值的最终单行向量。Exhaustive all programs by exhausting all possible situations for each step. Examine every final single-line vector generated by exhaustiveness: under the premise of distinguishing e at different positions in the possible matrix solution, keep two or more identical final single-row vectors one, and remove the redundant identical A single-line vector, and then the final single-line vector with the order value of the two positions reversed.
在删除多余的雷同的最终单行向量之后,如果每一个最终单行向量都出现两个位置逆反的顺序值,则违反自然规律,排除全部最终单行向量,进而排除该句法结构可能矩阵解。After deleting the redundant identical single-line vector, if each of the final single-line vectors has two order-reversed values, then the natural law is violated, and all the final single-row vectors are excluded, thereby eliminating the possible matrix solution of the syntax structure.
在删除多余的雷同的最终单行向量之后,凡是没有出现两个位置逆反的顺序值的最终单行向量,都符合自然规律,都是合理的最终单行向量;保留合理的最终单行向量作为正确结果之一,并保留该句法结构可能矩阵解作为正确结果之一,以备生成句法树之用。 After deleting the redundant identical single-line vector, the final single-line vector with no order reversal of the two positions is in accordance with the natural law, and is a reasonable final single-line vector; retaining a reasonable final single-line vector as one of the correct results And retain the syntactic structure possible matrix solution as one of the correct results, in order to generate a syntax tree.
[B 3.8.4]对两种方法的特点综述和比较:[B 3.8.4] Overview and comparison of the characteristics of the two methods:
可以通过数学方法证明:前述两个方法中的全部方案和全部步骤都是有限的、固定的、可查明的,并且可以给出全部具体方案和全部步骤的个数计算公式,还可以给出通过穷举生成的全部最终单行向量的个数计算公式。前述两个方法都构造了相应的映射全排列集合和相应的递归函数作为其数学模型。前述两个方法中的每一环节都有严格的数学依据和严密的数学论证。前述两个方法是完全符合自然规律的方法。It can be proved by mathematical methods that all the solutions and all the steps of the above two methods are finite, fixed, and identifiable, and can give the formula for calculating the number of all the specific schemes and all the steps, and can also give The formula for calculating the number of all final single-line vectors generated by exhaustive. Both of the above methods construct corresponding mapping full-aligned sets and corresponding recursive functions as their mathematical models. Each of the two methods described above has a rigorous mathematical basis and rigorous mathematical argumentation. The foregoing two methods are methods that fully conform to the laws of nature.
前述的两个方法都是针对下述原则展开的,都是下述原则的具体实施方案:The foregoing two methods are all developed for the following principles, which are specific implementations of the following principles:
原句的顺序值数列1,2,......,k,可以看作是经过可能矩阵解中的找得到明确位置的句法向量的等量代换和经过可能矩阵解中的找不到明确位置的句法向量之间的有限次整体插空而得到的。即,与原句对应的初始句法向量
Figure PCTCN2015083760-appb-000328
可以看作是先经过可能矩阵解中的找得到明确位置的句法向量的等量代换,再经过可能矩阵解中的找不到明确位置的句法向量之间的有限次整体插空而得到的。各种不相同的插空情况,本质上就是组合数学中的排列和组合。
The order value sequence of the original sentence, 1, 2, ..., k, can be regarded as the equivalent substitution of the syntactic vector in the possible matrix solution to find the clear position and the search in the possible matrix solution. Obtained by a finite number of global insertions between syntactic vectors at explicit locations. That is, the initial syntax vector corresponding to the original sentence
Figure PCTCN2015083760-appb-000328
It can be seen as an equal-substitution of the syntactic vector in which the explicit position is found in the possible matrix solution, and then through the finite sub-interpolation between the syntactic vectors in the possible matrix solution where the clear position cannot be found. . A variety of different insertions are essentially the permutations and combinations in combinatorial mathematics.
前述的两个方法都满足了上述原则的要求,并且两个方法的最终结果是完全一致的。由此可见,前述的两个方法是等效的方法。Both of the foregoing methods satisfy the requirements of the above principles, and the final results of the two methods are completely consistent. Thus, the two methods described above are equivalent methods.
在选取用来造空的句法元素方面:前述的第1种方法对于造空元素的选择是有限制的,实际上是要求保持插空句法向量的先后顺序;前述的第2种方法对于造空元素的选择是没有限制的,实际上是不要求保持插空句法向量的先后顺序。In selecting the syntactic elements used to make the space: the first method mentioned above has restrictions on the selection of the empty elements, and actually requires the order of inserting the syntax vectors; the second method described above is for emptying. There is no limit to the choice of elements. In fact, it is not required to maintain the order of inserting syntax vectors.
在最终单行向量方面:在区分可能矩阵解中的不同位置上的e的前提之下,前述第1种方法生成的最终单行向量都是两两不相同的句法向量,而前述的第2种方法生成的最终单行向量可能会出现雷同,因此要删除多余的雷同的最终单行向量。 In terms of the final single-line vector: under the premise of distinguishing e at different positions in the possible matrix solution, the final single-line vector generated by the first method described above is a syntactic vector different from each other, and the second method described above The resulting final single-line vector may be similar, so remove the extra identical final single-line vector.
下面,本文将分别对前述两种方法做具体说明。Below, this article will separately explain the above two methods.
[B 3.8.5]对第1种插空方法的具体说明[B 3.8.5] Specific description of the first type of insertion method
[B 3.8.5.1]构造作为方案模式的映射[B 3.8.5.1] Constructing a mapping as a schema pattern
注:并列名词代词组合向量和关联词组合向量都看作一个整体,不能被其他句法向量整体插空。Note: The parallel noun pronoun combination vector and the associated word combination vector are regarded as a whole and cannot be inserted into the whole of other syntax vectors.
原句的顺序值数列1,2,......,k,可以看作是经过可能矩阵解中的找得到明确位置的句法向量的等量代换和经过可能矩阵解中的找不到明确位置的句法向量之间的有限次整体插空而得到的。即,与原句对应的初始句法向量
Figure PCTCN2015083760-appb-000329
可以看作是先经过可能矩阵解中的找得到明确位置的句法向量的等量代换,再经过可能矩阵解中的找不到明确位置的句法向量之间的有限次整体插空而得到的。各种不相同的插空情况,本质上就是组合数学中的排列和组合。
The order value sequence of the original sentence, 1, 2, ..., k, can be regarded as the equivalent substitution of the syntactic vector in the possible matrix solution to find the clear position and the search in the possible matrix solution. Obtained by a finite number of global insertions between syntactic vectors at explicit locations. That is, the initial syntax vector corresponding to the original sentence
Figure PCTCN2015083760-appb-000329
It can be seen as an equal-substitution of the syntactic vector in which the explicit position is found in the possible matrix solution, and then through the finite sub-interpolation between the syntactic vectors in the possible matrix solution where the clear position cannot be found. . A variety of different insertions are essentially the permutations and combinations in combinatorial mathematics.
下面详细介绍前述的单侧保序整体插空方法。该方法能够精确地刻画出可能矩阵解中的找不到明确位置的句法向量之间的有限次整体插空的每一种情况。The aforementioned single-sided order overall insertion method will be described in detail below. This method is able to accurately characterize each of the finite sub-interpolations between syntactic vectors that cannot find an explicit position in the possible matrix solution.
将原可能矩阵解转化为θ个找不到明确位置的新句法向量
Figure PCTCN2015083760-appb-000330
记为:
Figure PCTCN2015083760-appb-000331
将这θ个句法向量进行全排列,根据组合数学的相关原理,这样的全排列结果是
Figure PCTCN2015083760-appb-000332
个;即经过这样的全排列,共得到θ!个θ元有序组。将经过这样的全排列得到的θ!个θ元有序组构成的集合记为
Figure PCTCN2015083760-appb-000333
(θ≥2)
Convert the original possible matrix solution into θ new syntax vectors without finding a clear position
Figure PCTCN2015083760-appb-000330
Recorded as:
Figure PCTCN2015083760-appb-000331
The θ syntactic vectors are fully arranged. According to the related principles of combinatorial mathematics, such a permutation result is
Figure PCTCN2015083760-appb-000332
That is, after such a full arrangement, a total of θ! θ element ordered group. The θ that will be obtained through such a full arrangement! The set of θ-element ordered groups is recorded as
Figure PCTCN2015083760-appb-000333
(θ≥2)
构造j个两两都不相同的θ元映射ρj,j∈N,1≤j≤θ!令每个θ元映射ρj都是从集合{t1,t2,...,tθ}到集合
Figure PCTCN2015083760-appb-000334
的映射;集合{t1,t2,...,tθ}仅用来表示映射ρj的定义域,进而辅 助标定集合Φ的θ元全排列,没有其他的实际含义。构造映射如下:
Figure PCTCN2015083760-appb-000335
Figure PCTCN2015083760-appb-000336
j∈N,1≤j≤θ!;对任给的j1和j2,j1∈N,j2∈N,1≤j1≤θ!,1≤j2≤θ!,如果j1≠j2,则ρj1≠ρj2。(θ≥2)
Construct j θ element mappings ρj, j∈N, 1 ≤ j ≤ θ! Let each θ element map ρj be from the set {t 1 , t 2 ,...,t θ } to the set
Figure PCTCN2015083760-appb-000334
The mapping {t 1 , t 2 , ..., t θ } is only used to represent the domain of the mapping ρj, and thus the θ element of the calibration set Φ is fully arranged, and has no other practical meaning. The construction map is as follows:
Figure PCTCN2015083760-appb-000335
Figure PCTCN2015083760-appb-000336
j∈N, 1 ≤ j ≤ θ! For any given j 1 and j 2 , j 1 ∈N, j 2 ∈N, 1 ≤ j 1 ≤ θ! , 1 ≤ j 2 ≤ θ! If j 1 ≠j 2 , then ρj 1 ≠ρj 2 . (θ≥2)
对于θ元映射ρj,有如下的结论成立:
Figure PCTCN2015083760-appb-000337
Figure PCTCN2015083760-appb-000338
显然,对任意一个ρj(tk),都存在
Figure PCTCN2015083760-appb-000339
1≤δ≤θ,使得
Figure PCTCN2015083760-appb-000340
即任意一个ρj(tk)都标定了集合
Figure PCTCN2015083760-appb-000341
中的一个句法向量
Figure PCTCN2015083760-appb-000342
(θ≥2)
For the θ-element mapping ρj, the following conclusions are established:
Figure PCTCN2015083760-appb-000337
Figure PCTCN2015083760-appb-000338
Obviously, for any ρj(t k ), it exists
Figure PCTCN2015083760-appb-000339
1 ≤ δ ≤ θ, making
Figure PCTCN2015083760-appb-000340
That is, any ρj(t k ) is calibrated
Figure PCTCN2015083760-appb-000341
a syntactic vector
Figure PCTCN2015083760-appb-000342
(θ≥2)
依据前述的构造可知:由θ!个θ元映射ρj构成的有限集合Ω={ρ1,ρ2,ρ3,ρ4,ρ5,ρ6,...,ρθ!}就刻画了有限集合集合
Figure PCTCN2015083760-appb-000343
的θ元全排列。(θ≥2)
According to the above structure, it can be known that: by θ! The finite set Ω={ρ1, ρ2, ρ3, ρ4, ρ5, ρ6,..., ρθ! } is a collection of finite sets
Figure PCTCN2015083760-appb-000343
The θ elements are fully arranged. (θ≥2)
集合π={ρ1,ρ2,ρ3,ρ4,ρ5,ρ6,...,ρθ!}列表说明:(θ≥2)Set π={ρ1, ρ2, ρ3, ρ4, ρ5, ρ6,..., ρθ! }List description: (θ≥2)
Figure PCTCN2015083760-appb-000344
Figure PCTCN2015083760-appb-000344
定义1.1:将在一个前述的从选取第二类句法向量
Figure PCTCN2015083760-appb-000345
到生成最终单行向量的完整操作流程中使用 的全部θ个第二类句法向量的任意一种顺序排列,作为一个方案模式。举一例详细说明:如果
Figure PCTCN2015083760-appb-000346
作为方案模式,则表示第1个步骤是选取向量
Figure PCTCN2015083760-appb-000347
Figure PCTCN2015083760-appb-000348
并将向量
Figure PCTCN2015083760-appb-000349
以单侧保序整体插空的方式插入向量
Figure PCTCN2015083760-appb-000350
第2个步骤是选取
Figure PCTCN2015083760-appb-000351
并将
Figure PCTCN2015083760-appb-000352
以单侧保序整体插空的方式插入第1个步骤生成的新句法向量中,……,第θ个步骤是选取向量
Figure PCTCN2015083760-appb-000353
并将
Figure PCTCN2015083760-appb-000354
以单侧保序整体插空的方式插入第(θ-1)个步骤生成的新句法向量中。显然可见,任意一个θ元映射ρj都是一个方案模式,并且全部θ元映射ρj的集合
Figure PCTCN2015083760-appb-000355
就是全部方案模式的集合,则全部方案模式的总数是θ!个。(θ≥2)
Definition 1.1: The second type of syntax vector will be selected in one of the foregoing
Figure PCTCN2015083760-appb-000345
Arranges any of the θ second-class syntax vectors used in the complete operational flow of generating the final single-line vector as a scheme pattern. Give a detailed explanation: if
Figure PCTCN2015083760-appb-000346
As a scheme mode, it means that the first step is to select the vector.
Figure PCTCN2015083760-appb-000347
with
Figure PCTCN2015083760-appb-000348
And vector
Figure PCTCN2015083760-appb-000349
Insert vector in a single-sided ordering overall insertion
Figure PCTCN2015083760-appb-000350
The second step is to select
Figure PCTCN2015083760-appb-000351
And
Figure PCTCN2015083760-appb-000352
Insert the new syntax vector generated in the first step by inserting the unilaterally ordered order into the whole space, ..., the θth step is to select the vector
Figure PCTCN2015083760-appb-000353
And
Figure PCTCN2015083760-appb-000354
The new syntax vector generated by the (θ-1)th step is inserted in a single-sided order-slot overall insertion. Obviously, any θ element mapping ρj is a scheme mode, and all θ element mapping ρj sets
Figure PCTCN2015083760-appb-000355
Is the collection of all program modes, then the total number of all program modes is θ! One. (θ≥2)
定义1.2:将依据任意一个方案模式而进行的任意一次插空并生成新向量的操作,作为该方案模式的一个步骤。对任意一个方案模式ρj,将依据ρj执行的第k个步骤记为
Figure PCTCN2015083760-appb-000356
[nk]表示第k个步骤总共有nk种可供选择的情况。
Definition 1.2: The operation of inserting an empty vector and generating a new vector according to any one of the scheme modes as a step of the scheme mode. For any one of the scheme modes ρj, the kth step performed according to ρj is recorded as
Figure PCTCN2015083760-appb-000356
[n k ] indicates that there are a total of n k kinds of choices for the kth step.
定义1.3:将依据任意一个方案模式ρj执行的每一个步骤都选取任意一种具体情况,再将各个步骤都联合起来,作为一个具体方案。将任意一个具体方案记为
Figure PCTCN2015083760-appb-000357
ρj表示该具体方案所依据的方案模式,ik表示第k个步骤选取该步骤上的第ik种情况。
Definition 1.3: Each step performed according to any one of the scheme modes ρj is selected in any specific case, and then each step is combined as a specific scheme. Write any specific plan as
Figure PCTCN2015083760-appb-000357
Ρj denotes the scheme mode on which the specific scheme is based, and i k denotes that the kth step selects the i kth case on the step.
[B 3.8.5.2]构造插空递归函数[B 3.8.5.2] Constructing an empty recursive function
接下来,本文要依据任意一个方案模式ρj,构造一个插空递归算法
Figure PCTCN2015083760-appb-000358
通过该递归算法,就能够刻画前述的每一次单侧保序整体插空的具体操作过程。在构造插空递归算法之前,首先给出下列5个定义,作为预备知识:
Next, this paper constructs an emptying recursive algorithm based on any scheme mode ρj.
Figure PCTCN2015083760-appb-000358
Through the recursive algorithm, it is possible to describe the specific operation process of the above-mentioned single-side ordering overall insertion. Before constructing the interpolation recursive algorithm, first give the following five definitions as preliminary knowledge:
下面要构造的插空递归算法
Figure PCTCN2015083760-appb-000359
就是前述的依据方案模式ρj执行的第k个步骤。其中的k表示插空递归算法
Figure PCTCN2015083760-appb-000360
运行的次数,即执行前述的单侧保序整体 插空操作的次数。
The emptying recursive algorithm to be constructed below
Figure PCTCN2015083760-appb-000359
It is the kth step performed according to the scheme mode ρj described above. Where k is the interpolation null recursive algorithm
Figure PCTCN2015083760-appb-000360
The number of runs, that is, the number of times the aforementioned one-side ordering overall insertion operation is performed.
定义1.4:任给一个句法向量α,一元函数W表示取出并标记句法向量α。W(α)=αk表示取出句法向量α,并将句法向量α标记为αk,称αk为:输入向量。Definition 1.4: Give a syntax vector α, and the unary function W indicates that the syntax vector α is taken out and marked. W (α) = α k represents a vector syntax removed α, and the vector [alpha] Syntax labeled α k, α k is called: input vector.
定义1.5:任给一个句法向量β,一元函数Q表示取出并标记句法向量β。Q(β)=βk表示取出句法向量β,并将句法向量β标记为βk,称βk为:插空向量。在运行插空递归算法
Figure PCTCN2015083760-appb-000361
过程中,要将句法向量βk插入句法向量αk中。
Definition 1.5: Give a syntax vector β, and the unary function Q indicates that the syntax vector β is taken out and marked. Q(β)=β k denotes that the syntax vector β is taken out, and the syntax vector β is marked as β k , and β k is called: an empty vector. Run the recursive algorithm
Figure PCTCN2015083760-appb-000361
In the process, the syntax vector β k is inserted into the syntax vector α k .
定义1.6:二元函数Z表示对句法向量αk标注顺序值,将句法向量αk中的从右数起第1个句法元素标注顺序值1,然后从右至左依次标注顺序值2,3,......,直至标注到向量αk中包含的向量βk-1中的从左数起的第1个句法元素为止。将向量βk-1中的从左数起的第1个句法元素记为λ(βk-1),将λ(βk-1)的顺序值记为nk,则nk是前述标注的最大顺序值。该过程刻画为:设句法向量αk=b......λ(βk-1)......b2b1,元素λ(βk-1)表示向量βk-1中的从左数起的第1个元素。在第k个步骤中,对αk运行二元函数Zk(α,β)得:λ(βk-1)<nk>......b2<2>b1<1>,将这个结果记为:向量[αk\λ(βk-1)],称该向量为:甩尾向量。记号
Figure PCTCN2015083760-appb-000362
表示:对甩尾向量[αk\λ(βk-1)]标注的最大顺序值是nk。运行二元函数Z得
Figure PCTCN2015083760-appb-000363
1.6 definitions: Z represents a binary function of the syntax of the vector sequence label value α k, α k vector syntactic counted from the right in a first syntax element denoted by ordinal value 1, and then from right to left are denoted sequential values 2,3 , ..., until the first syntax element from the left in the vector β k-1 contained in the vector α k is attached. The first syntax element from the left in the vector β k-1 is denoted by λ(β k-1 ), and the order value of λ(β k-1 ) is denoted as n k , then n k is the aforementioned annotation The maximum order value. The process is characterized by: setting the syntax vector α k = b ... λ(β k-1 ) ... b 2 b 1 , and the element λ(β k-1 ) represents the vector β k-1 The first element in the number from the left. In the kth step, the binary function Z k (α, β) is run for α k to be: λ(β k-1 )<n k >...b 2 <2>b 1 <1> , this result is recorded as: vector [α k \λ(β k-1 )], which is called: the tail vector. mark
Figure PCTCN2015083760-appb-000362
Representation: The maximum order value for the dovetail vector [α k \λ(β k-1 )] is n k . Run the binary function Z
Figure PCTCN2015083760-appb-000363
定义1.7:二元函数T表示对甩尾向量进行整体插空操作,在二元函数Z运行完毕之后,在甩尾向量上选取从右边数的第ik个句法元素,并在第ik个句法元素的右侧构造唯一的空位,然后将向量βk以整体插空的方式插入该空位。该过程刻画为:设句法向量αk=b......λ(βk-1)......b2b1,元素λ(βk-1)表示向量βk-1中的从左数起第1个元素。则甩尾向量是[αk\λ(βk-1)]=λ(βk-1)< nk>......b2<2>b1<1>。在第k个步骤中,对向量[αk\λ(βk-1)]和向量βk运行二元函数Tk(α,β),将βk以整体插空的方式插入[αk\λ(βk-1)]右边数第ik个句法元素所对应的空位得到:
Figure PCTCN2015083760-appb-000364
将经过前述的插空而得到的新向量记为
Figure PCTCN2015083760-appb-000365
称向量
Figure PCTCN2015083760-appb-000366
为:输出向量。
Figure PCTCN2015083760-appb-000367
Figure PCTCN2015083760-appb-000368
Definition 1.7: The binary function T represents the overall insertion operation of the tail vector. After the binary function Z is finished, the i th kth syntax element from the right side is selected on the tail vector, and at the i kth The right side of the syntax element constructs a unique gap, and then the vector β k is inserted into the gap as a whole. The process is characterized by: setting the syntax vector α k = b ... λ(β k-1 ) ... b 2 b 1 , and the element λ(β k-1 ) represents the vector β k-1 The first element in the number from the left. Then, the dovetail vector is [α k \λ(β k-1 )]=λ(β k-1 )< n k >...b 2 <2>b 1 <1>. In the kth step, the binary function T k (α, β) is run on the vector [α k \λ(β k-1 )] and the vector β k , and β k is inserted into the whole space [α k \λ(β k-1 )] The vacancy corresponding to the ith kth syntax element of the right side is obtained:
Figure PCTCN2015083760-appb-000364
Record the new vector obtained by the aforementioned insertion into
Figure PCTCN2015083760-appb-000365
Vector
Figure PCTCN2015083760-appb-000366
Is: output vector.
Figure PCTCN2015083760-appb-000367
Figure PCTCN2015083760-appb-000368
定义1.8:任给一个句法向量α,将α中包含的句法元素的个数记为σ[α]。如果句法向量α中包含n个句法元素,n∈N,则显然有:n=σ[α]。Definition 1.8: Give a syntax vector α, and record the number of syntax elements contained in α as σ[α]. If the syntax vector α contains n syntax elements, n∈N, then there is obviously: n=σ[α].
注:在下面的插空递归算法的定义中,会看到这样的等式:Note: In the definition of the emptying recursive algorithm below, you will see this equation:
Figure PCTCN2015083760-appb-000369
Figure PCTCN2015083760-appb-000369
其中的α和β是自变量的含义,是抽象记号,可以宽泛取值。因此,上述的记法并不矛盾。Among them, α and β are the meanings of independent variables, which are abstract marks and can be widely used. Therefore, the above notation is not contradictory.
下面,依据本文前述的映射ρj和4个函数,定义插空递归算法
Figure PCTCN2015083760-appb-000370
如下:
Next, according to the mapping ρj and the four functions described above, the interpolation null recursive algorithm is defined.
Figure PCTCN2015083760-appb-000370
as follows:
注:任取一个方案模式ρj;集合
Figure PCTCN2015083760-appb-000371
Note: take a scheme mode ρj; collection
Figure PCTCN2015083760-appb-000371
Figure PCTCN2015083760-appb-000372
Figure PCTCN2015083760-appb-000372
特别强调:当k=1时,插空递归算法
Figure PCTCN2015083760-appb-000373
的初始条件
Figure PCTCN2015083760-appb-000374
是:
Special emphasis: when k = 1, the interpolation recursive algorithm
Figure PCTCN2015083760-appb-000373
Initial condition
Figure PCTCN2015083760-appb-000374
Yes:
Figure PCTCN2015083760-appb-000375
Figure PCTCN2015083760-appb-000375
递归算法
Figure PCTCN2015083760-appb-000376
的特点在于:将前述的单侧保序整体插空操作分解成了四个环节:①取造空向量;②取插空向量;③对造空向量中特定的句法元素标注顺序值,并截取甩尾向量;④任意选取甩尾向量中的一个句法元素,并在其右侧构造唯一的空位,然后将插空向量以整体插空的方式插入之前构造的空位。
Recursive algorithm
Figure PCTCN2015083760-appb-000376
The characteristic is that the above-mentioned single-side ordering overall insertion operation is decomposed into four links: 1 to take a null vector; 2 to take an empty vector; 3 to mark a specific syntax element in the empty vector, and to intercept the sequence value, and intercept The tail vector; 4 randomly selects a syntactic element in the tail vector, and constructs a unique gap on the right side, and then inserts the inserted vector into the previously constructed gap in a global insertion manner.
[B 3.8.5.3]插空递归算法
Figure PCTCN2015083760-appb-000377
的操作举例
[B 3.8.5.3] Insertion recursive algorithm
Figure PCTCN2015083760-appb-000377
Operation example
Figure PCTCN2015083760-appb-000378
Figure PCTCN2015083760-appb-000378
(区分不同位置上的e)(differentiating e at different locations)
令集合
Figure PCTCN2015083760-appb-000379
θ=4,θ!=24,ρj=ρ2。
Order set
Figure PCTCN2015083760-appb-000379
θ=4, θ! =24, ρj=ρ2.
Figure PCTCN2015083760-appb-000380
Figure PCTCN2015083760-appb-000381
make
Figure PCTCN2015083760-appb-000380
then
Figure PCTCN2015083760-appb-000381
Figure PCTCN2015083760-appb-000382
Figure PCTCN2015083760-appb-000382
令nε是第ε个步骤选取的甩尾向量中的句法元素的个数,iε是第ε个步骤构造的空位对应的元素顺序数。Let n ε be the number of syntactic elements in the tail vector selected by the εth step, and i ε is the number of element sequences corresponding to the vacancies constructed by the εth step.
执行方案模式ρ2,则要将插空递归算法
Figure PCTCN2015083760-appb-000383
运行3次。选取一个具体方案
Figure PCTCN2015083760-appb-000384
<3,4,2>:
Execution scheme mode ρ2, the interpolation recursive algorithm
Figure PCTCN2015083760-appb-000383
Run 3 times. Choose a specific solution
Figure PCTCN2015083760-appb-000384
<3,4,2>:
Figure PCTCN2015083760-appb-000385
<3,4,2>运行如下:(n1∈N,i1∈N,1≤i1≤n1)
Figure PCTCN2015083760-appb-000385
< 3 , 4, 2> operates as follows: (n 1 ∈N, i 1 ∈N, 1 ≤ i 1 ≤ n 1 )
Figure PCTCN2015083760-appb-000386
Figure PCTCN2015083760-appb-000386
易得n1=4;取i1=3,得到:T1(α,β)=(7 eC(eA 5 6 eB)8 eD)It is easy to get n 1 =4; take i 1 =3, get: T 1 (α,β)=(7 e C (e A 5 6 e B )8 e D )
Figure PCTCN2015083760-appb-000387
<3,4,2>运行如下:(n2∈N,i2∈N,1≤i2≤n2)
Figure PCTCN2015083760-appb-000387
< 3 , 4 , 2 > operates as follows: (n 2 ∈N, i 2 ∈N, 1 ≤ i 2 ≤ n 2 )
Figure PCTCN2015083760-appb-000388
Figure PCTCN2015083760-appb-000388
易得n2=6;取i2=4,得到:Easy to get n 2 =6; take i 2 =4, get:
T2(α,β)=(7 eC(eA 5 6(1 2 3 4)eB)8 eD)T 2 (α,β)=(7 e C (e A 5 6(1 2 3 4)e B )8 e D )
Figure PCTCN2015083760-appb-000389
<3,4,2>运行如下:(n3∈N,i3∈N,1≤i3≤n3)
Figure PCTCN2015083760-appb-000389
< 3 , 4 , 2 > operates as follows: (n 3 ∈N, i 3 ∈N, 1 ≤ i 3 ≤ n 3 )
Figure PCTCN2015083760-appb-000390
Figure PCTCN2015083760-appb-000390
易得n3=7;取i3=2,得到:Easy to get n 3 =7; take i 3 =2, get:
T3(α,β)=(7 eC(eA 5 6(1 2 3 4)eB)8(eE 9 10 eF)eD)T 3 (α,β)=(7 e C (e A 5 6(1 2 3 4)e B )8(e E 9 10 e F )e D )
[B 3.8.5.4]对该法穷尽性和互异性的说明 [B 3.8.5.4] Description of the exhaustion and dissimilarity of the law
结论1.1:前述的θ元映射的全排列集合
Figure PCTCN2015083760-appb-000391
和前述的插空递归算法
Figure PCTCN2015083760-appb-000392
穷尽了前述的找不到明确位置的句法向量之间的有限次整体插空的全部可能。
Conclusion 1.1: The full permutation set of the aforementioned θ-metabol map
Figure PCTCN2015083760-appb-000391
Insertion null recursive algorithm
Figure PCTCN2015083760-appb-000392
Exhausting all of the above possibilities for a finite number of overall insertions between syntactic vectors that cannot find an explicit position.
结论1.2:依据前述的插空递归算法
Figure PCTCN2015083760-appb-000393
的定义可知,在本文方法的任意两个具体方案中,都分别存在互不相同的两个步骤,即任意两个具体方案都不相同。因此,在区分不同位置上的e的前提之下,本文方法的全部具体方案生成的全部最终单行向量都是两两不相同的句法向量。
Conclusion 1.2: According to the aforementioned interpolation recursive algorithm
Figure PCTCN2015083760-appb-000393
As can be seen from the definition, in any two specific schemes of the method, there are two different steps, that is, any two specific schemes are different. Therefore, under the premise of distinguishing e at different positions, all the final single-line vectors generated by all the specific schemes of the method are different syntactic vectors.
[B 3.8.5.5]第一组计算公式和相关证明[B 3.8.5.5] The first set of calculation formulas and related proofs
注:并列名词代词组合向量和关联词组合向量都看作一个整体,不能被其他句法向量整体插空。Note: The parallel noun pronoun combination vector and the associated word combination vector are regarded as a whole and cannot be inserted into the whole of other syntax vectors.
注:以下的引理和定理,都是在给定了本文方法中的任意一个方案模式ρj和插空递归算法
Figure PCTCN2015083760-appb-000394
的前提下展开讨论;都是在区分可能矩阵解中的不同位置上的e的前提之下展开讨论。
Note: The following lemmas and theorems are given in any of the schemes in this paper, ρj and the interpolation null recursive algorithm.
Figure PCTCN2015083760-appb-000394
Discussion under the premise of the discussion; all under the premise of distinguishing e at different positions in the possible matrix solution.
定义1.9:运行插空递归算
Figure PCTCN2015083760-appb-000395
设在第k个步骤上截取的甩尾向量的个数是n,不妨设在第k个步骤上截取的n个甩尾向量是α1,α2,.......,αn,则在第k个步骤上对这些向量中的每一个进行整体插空,将这些向量的插空情况的个数之和记为τ[∑(k)]。
Definition 1.9: Run the insertion recursion calculation
Figure PCTCN2015083760-appb-000395
The number of the tail vectors intercepted in the kth step is n, and it is possible to set the n tail vectors intercepted in the kth step to be α 1 , α 2 , . . . , α n Then, the entire insertion of each of these vectors is performed in the kth step, and the sum of the number of insertions of these vectors is denoted as τ [∑(k)].
引理1.1:(σ的定义,参见定义2.5) Lemma 1.1: (definition of σ, see definition 2.5)
Figure PCTCN2015083760-appb-000396
Figure PCTCN2015083760-appb-000396
证:如下:Certificate: The following:
(1)k≥2时,依据前述的定义2.4可知,第k个步骤的甩尾向量
Figure PCTCN2015083760-appb-000397
Figure PCTCN2015083760-appb-000398
其中的句法元素
Figure PCTCN2015083760-appb-000399
表示向量αk中包含的元素,<βk-1>表示向量βk-1在向量αk中的元素
Figure PCTCN2015083760-appb-000400
所对应的空位上。依据甩尾向量[αk\λ(βk-1)]的表达式,在该句法向量中,元素的个数显然是:σ[βk]+(ik-1)。待证明的结论成立。
(1) When k≥2, according to the above definition 2.4, the tail vector of the kth step
Figure PCTCN2015083760-appb-000397
Figure PCTCN2015083760-appb-000398
Syntactic element
Figure PCTCN2015083760-appb-000399
Represents the element contained in the vector α k , <β k-1 > represents the element of the vector β k-1 in the vector α k
Figure PCTCN2015083760-appb-000400
Corresponding vacancies. According to the expression of the dovetail vector [α k \λ(β k-1 )], the number of elements in the syntax vector is obviously: σ[β k ]+(i k -1). The conclusion to be proved is established.
(2)k=1时,依据前述的插空递归算法
Figure PCTCN2015083760-appb-000401
的定义直接得到。
(2) When k=1, according to the aforementioned interpolation recursion algorithm
Figure PCTCN2015083760-appb-000401
The definition is obtained directly.
                                                 证毕Certificate
引理1.2:任给一个k,k∈N且k≥1,对在第k个步骤上截取的甩尾向量标注的句法元素的最大顺序值是:Lemma 1.2: Given a k, k∈N and k ≥ 1, the maximum order value of the syntax elements of the end vector vector intercepted on the kth step is:
Figure PCTCN2015083760-appb-000402
Figure PCTCN2015083760-appb-000402
证:依据算法
Figure PCTCN2015083760-appb-000403
中的函数Zk(α,β)的递归定义,对在第k个步骤上截取的甩尾向量[αk\λ(βk-1)]标注的句法元素的最大顺序值是:
Certificate: algorithm
Figure PCTCN2015083760-appb-000403
The recursive definition of the function Z k (α, β), the maximum order value of the syntax elements labeled for the tail vector [α k \λ(β k-1 )] intercepted at the kth step is:
Figure PCTCN2015083760-appb-000404
Figure PCTCN2015083760-appb-000404
依据引理1.1,第k个步骤的甩尾向量[αk\λ(βk-1)]包含的句法元素的个数是:According to Lemma 1.1, the number of syntactic elements contained in the tail vector [α k \λ(β k-1 )] of the kth step is:
Figure PCTCN2015083760-appb-000405
Figure PCTCN2015083760-appb-000405
则对在第k个步骤上截取的甩尾向量标注的句法元素的最大顺序值是:The maximum order value of the syntax elements of the end vector vector intercepted on the kth step is:
Figure PCTCN2015083760-appb-000406
Figure PCTCN2015083760-appb-000406
                                   证毕Certificate
引理1.3:插空递归算法
Figure PCTCN2015083760-appb-000407
生成的具体方案
Figure PCTCN2015083760-appb-000408
<i1,i2,...,ik>的个数是:(σ的定义,参见定义2.5)  (k∈N,k≥1)
Lemma 1.3: Insertion recursive algorithm
Figure PCTCN2015083760-appb-000407
Generated specific plan
Figure PCTCN2015083760-appb-000408
The number of <i 1 ,i 2 ,...,i k > is: (definition of σ, see definition 2.5) (k∈N,k≥1)
Figure PCTCN2015083760-appb-000409
Figure PCTCN2015083760-appb-000409
证:依据引理1.2,任给一个k,k∈N且k≥1,对在第k个步骤上截取的甩尾向量标注的句法元素的最大顺序值是: Proof: According to Lemma 1.2, given a k, k∈N and k ≥ 1, the maximum order value of the syntax elements of the tail vector vector intercepted in the kth step is:
Figure PCTCN2015083760-appb-000410
Figure PCTCN2015083760-appb-000410
依据算法
Figure PCTCN2015083760-appb-000411
中的插空函数Tk(α,β)的递归定义,i1的取值范围是:1≤i1≤n1,即1≤i1≤σ[α1]。
Algorithm
Figure PCTCN2015083760-appb-000411
Interpolation functions of T k (α, β) recursive definition, i is in the range 1: 1≤i 1 ≤n 1, i.e. 1≤i 1 ≤σ [α 1].
依据算法
Figure PCTCN2015083760-appb-000412
中的插空函数Tk(α,β)的递归定义,k≥3时ik-1的取值范围是:1≤ik-1≤nk-1,即1≤ik-1≤σ[βk-2]+(ik-2-1)。
Algorithm
Figure PCTCN2015083760-appb-000412
Interpolation functions of T k (α, β) recursive definition, k≥3 i k-1 when the ranges are: 1≤i k-1 ≤n k- 1, i.e. 1≤i k-1 ≤ σ[β k-2 ]+(i k-2 -1).
不妨设α1=aσ1]……a2a1,β1=λ(β1)......b2b1,则依据前述的结果和插空递归算法
Figure PCTCN2015083760-appb-000413
的定义,第1个步骤的插空情况总数是σ[α1],即β1有σ[α1]个插入α1的插空情况。
Let α 1 =a σ1 ]...a 2 a 11 =λ(β 1 )...b 2 b 1 , then according to the above results and the interpolation recursive algorithm
Figure PCTCN2015083760-appb-000413
The definition, the total number of insertions in the first step is σ[α 1 ], that is, β 1 has σ[α 1 ] insertions of α 1 .
不妨设β2=λ(β2)......c2c1
Figure PCTCN2015083760-appb-000414
则依据前述的结果和算法
Figure PCTCN2015083760-appb-000415
中的插空函数Tk(α,β)的递归定义,对任意给定的一个i1,1≤i1≤σ[α1],第2个步骤的插空情况的个数都是σ[β1]+(i1-1);由于i1的取值方法是遍历实数区间[1,σ[α1]]内的全体自然数,则对于i1的全部取值,按照求和∑的方式进行计数,可以算得:第2个步骤插空情况总数是
Figure PCTCN2015083760-appb-000416
即β2
Figure PCTCN2015083760-appb-000417
Figure PCTCN2015083760-appb-000418
个插入甩尾向量的插空情况。
Let us set β 2 =λ(β 2 )...c 2 c 1 ,
Figure PCTCN2015083760-appb-000414
Based on the aforementioned results and algorithms
Figure PCTCN2015083760-appb-000415
The recursive definition of the interpolation function T k (α, β) in the given case, for any given one i 1 , 1 ≤ i 1 ≤ σ [α 1 ], the number of insertions in the second step is σ [β 1 ]+(i 1 -1); Since the value of i 1 is to traverse the entire natural number in the real interval [1, σ[α 1 ]], then for all values of i 1 , according to the summation ∑ The way to count, you can calculate: the total number of insertions in the second step is
Figure PCTCN2015083760-appb-000416
That is, β 2 has
Figure PCTCN2015083760-appb-000417
Figure PCTCN2015083760-appb-000418
Insertion of the insertion of the tail vector.
不妨设β3=λ(β3)...d2d1
Figure PCTCN2015083760-appb-000419
Figure PCTCN2015083760-appb-000420
Figure PCTCN2015083760-appb-000421
则依据前述的结果和算法
Figure PCTCN2015083760-appb-000422
中的插空函数Tk(α,β)的递归定义,对任意给定的一个i2,1≤i2≤σ[β1]+(i1-1),第3个步骤的插空情况个数都是σ[β2]+(i2-1);由于i2的取值方法是遍历实数区间[1,σ[β2]+(i2-1)]内的全体自然数,i1的取值方法是遍历实数区间[1,σ[α1]] 内的全体自然数,则对于i2的全部取值,按照求和∑的方式进行计数累加,可以算得:第3个步骤插空情况总数是
Figure PCTCN2015083760-appb-000423
即β3
Figure PCTCN2015083760-appb-000424
Figure PCTCN2015083760-appb-000425
个插入甩尾向量的插空情况。
Let us set β 3 =λ(β 3 )...d 2 d 1 ,
Figure PCTCN2015083760-appb-000419
or
Figure PCTCN2015083760-appb-000420
Figure PCTCN2015083760-appb-000421
Based on the aforementioned results and algorithms
Figure PCTCN2015083760-appb-000422
The recursive definition of the interpolation function T k (α, β) in the given step, for any given i 2 , 1 ≤ i 2 ≤ σ [β 1 ] + (i 1 -1), the insertion of the third step The number of cases is σ[β 2 ]+(i 2 -1); since the value of i 2 is to traverse the entire natural number in the real interval [1, σ[β 2 ]+(i 2 -1)], The value of i 1 is to traverse the entire natural number in the real interval [1, σ[α 1 ]], and then the total value of i 2 is counted and accumulated according to the summation ,, which can be calculated: the third step The total number of insertions is
Figure PCTCN2015083760-appb-000423
That is, β 3 has
Figure PCTCN2015083760-appb-000424
Figure PCTCN2015083760-appb-000425
Insertion of the insertion of the tail vector.
以下用数学归纳法证明:在第k个步骤上,对全体甩尾向量进行整体插空,获得的插空情况的总数τ[∑(k)]是:(k≥2)The following mathematical proof is used to prove that in the kth step, the whole of the tail vector is globally inserted, and the total number of insertions τ[∑(k)] obtained is: (k≥2)
Figure PCTCN2015083760-appb-000426
Figure PCTCN2015083760-appb-000426
假设:在第k个步骤上,对全体甩尾向量进行整体插空,获得的插空情况的总数τ[∑(k)]是:Hypothesis: In the kth step, the entire tail vector is globally inserted, and the total number of insertions τ[∑(k)] obtained is:
Figure PCTCN2015083760-appb-000427
Figure PCTCN2015083760-appb-000427
依据算法
Figure PCTCN2015083760-appb-000428
中的插空函数Tk(α,β)的递归定义,k≥2时ik的取值范围是:1≤ik≤nk,即1≤ik≤σ[βk-1]+(ik-1-1)。
Algorithm
Figure PCTCN2015083760-appb-000428
The recursive definition of the interpolation function T k (α, β) in the middle, the range of i k when k ≥ 2 is: 1 i k ≤ n k , that is, 1 ≤ i k ≤ σ [β k-1 ] + (i k-1 -1).
依据算法
Figure PCTCN2015083760-appb-000429
中的插空函数Tk(α,β)的递归定义,对任意给定的一个ik,1≤ik≤σ[βk-1]+(ik-1-1),第(k+1)个步骤的插空情况个数都是σ[βk]+(ik-1);由于ik的取值方法是遍历实数区间[1,σ[βk-1]+(ik-1-1)]内的全体自然数,则对于任意给定的一个σ[βk-1]+(ik-1-1)的取值,按照求和∑的方式进行计数累加,遍历ik的全部取值,容易计算出:通过个数累加获得的第(k+1)个步骤的插空情况的总数是
Figure PCTCN2015083760-appb-000430
即对于任给的一个σ[βk-1]+(ik-1-1)的取值,βk+1都有
Figure PCTCN2015083760-appb-000431
个插入甩尾向量的插空情况。
Algorithm
Figure PCTCN2015083760-appb-000429
The recursive definition of the interpolation function T k (α, β), for any given one i k , 1 i k σ [β k-1 ] + (i k-1 -1), the (k The number of insertions in +1) steps is σ[β k ]+(i k -1); since the method of i k is to traverse the real interval [1, σ[β k-1 ]+(i For all the natural numbers in k-1 -1)], for any given value of σ[β k-1 ]+(i k-1 -1), the count is accumulated and traversed according to the method of summation ∑ The total value of i k is easy to calculate: the total number of insertions of the (k+1)th step obtained by accumulating the number is
Figure PCTCN2015083760-appb-000430
That is, for any given value of σ[β k-1 ]+(i k-1 -1), β k+1 has
Figure PCTCN2015083760-appb-000431
Insertion of the insertion of the tail vector.
进一步地,归纳假设已经提供了公式σ[βk-1]+(ik-1-1)的表达式,即,通过归纳假设可以确定σ[βk-1]+(ik-1-1)的全部取值。所以,按照求和∑的方式进行计数累加,从公式
Figure PCTCN2015083760-appb-000432
出发,依据归纳假设公式
Figure PCTCN2015083760-appb-000433
遍历ik-1,......,i2,i1的全 部取值,从而消去参数ik-1,......,i2,i1,则直接计算出:第(k+1)个步骤的插空情况总数是:
Further, the induction hypothesis has provided an expression of the formula σ[β k-1 ]+(i k-1 -1), that is, σ[β k-1 ]+(i k-1 - can be determined by the inductive hypothesis 1) All values. So, according to the way of summation, the count is accumulated, from the formula
Figure PCTCN2015083760-appb-000432
Starting, based on the assumption of induction hypothesis
Figure PCTCN2015083760-appb-000433
Traversing i k-1, ......, i full section 2, i 1 values, thereby erasing the parameter i k-1, ......, i 2, i 1, directly calculate: a first The total number of insertions in (k+1) steps is:
Figure PCTCN2015083760-appb-000434
Figure PCTCN2015083760-appb-000434
即βk+1总共有
Figure PCTCN2015083760-appb-000435
个插入甩尾向量的插空情况。待证明的结论成立,数学归纳法证明完毕。
That is, β k+1 has a total
Figure PCTCN2015083760-appb-000435
Insertion of the insertion of the tail vector. The conclusion to be proved is established and the mathematical induction method is completed.
综合上述的结果,第k个步骤的插空情况的总数τ[∑(k)]如下:(k≥1)Based on the above results, the total number of insertions τ [∑(k)] of the kth step is as follows: (k ≥ 1)
Figure PCTCN2015083760-appb-000436
Figure PCTCN2015083760-appb-000436
依据插空递归算法
Figure PCTCN2015083760-appb-000437
的定义,第k个步骤的插空情况的总数τ[∑(k)],也就是算法
Figure PCTCN2015083760-appb-000438
的最后一个步骤的插空情况的总数τ[∑(k)],与算法
Figure PCTCN2015083760-appb-000439
生成的具体方案
Figure PCTCN2015083760-appb-000440
的个数是相等的,综合上述的结果,插空递归算法
Figure PCTCN2015083760-appb-000441
生成的具体方案
Figure PCTCN2015083760-appb-000442
的个数是:(σ的定义,参见定义2.5)(k∈N,k≥1)
Insertion recursive algorithm
Figure PCTCN2015083760-appb-000437
Definition, the total number of insertions in the kth step τ [∑(k)], which is the algorithm
Figure PCTCN2015083760-appb-000438
The total number of insertions in the last step τ[∑(k)], and the algorithm
Figure PCTCN2015083760-appb-000439
Generated specific plan
Figure PCTCN2015083760-appb-000440
The number of the numbers is equal, combining the above results, the interpolation recursive algorithm
Figure PCTCN2015083760-appb-000441
Generated specific plan
Figure PCTCN2015083760-appb-000442
The number is: (the definition of σ, see definition 2.5) (k∈N, k ≥ 1)
Figure PCTCN2015083760-appb-000443
Figure PCTCN2015083760-appb-000443
定理1.1:将本文方法中的任意一个方案模式ρj生成的具体方案
Figure PCTCN2015083760-appb-000444
的个数记为Ω[ρj],则有:Ω[ρj]=(公式如下)(θ≥2)
Theorem 1.1: The specific scheme for generating any scheme pattern ρj in the method of this paper
Figure PCTCN2015083760-appb-000444
The number of the number is Ω[ρj], then: Ω[ρj]=(the formula is as follows) (θ≥2)
Figure PCTCN2015083760-appb-000445
Figure PCTCN2015083760-appb-000445
证:依据插空递归算法
Figure PCTCN2015083760-appb-000446
的定义,再结合该定义套用引理1.3,则待证结论显然成立。                    证毕
Card: According to the insertion recursion algorithm
Figure PCTCN2015083760-appb-000446
The definition of the combination, combined with the definition of the use of Lemma 1.3, the conclusion to be proved is clearly established. Certificate
定理1.2:将本文方法中的任意一个方案模式ρj生成的最终单行向量的个数记为Ω[ρj],则有:Ω[ρj]=(公式如下)  (θ≥2)Theorem 1.2: The number of the final single-row vector generated by any one of the scheme modes ρj in this method is Ω[ρj], then: Ω[ρj]=(Formula is as follows) (θ≥2)
Figure PCTCN2015083760-appb-000447
Figure PCTCN2015083760-appb-000447
证:依据最终单行向量和具体方案的定义,最终单行向量的个数和具体方案的个数相等。又依据定理4.1.1可得。            证毕Proof: According to the definition of the final single-line vector and the specific scheme, the number of final single-line vectors is equal to the number of specific schemes. It is also available according to Theorem 4.1.1. Certificate
[B 3.8.5.6]对第1种插空方法的操作举例[B 3.8.5.6] Example of operation of the first type of insertion method
Figure PCTCN2015083760-appb-000448
Figure PCTCN2015083760-appb-000448
取集合
Figure PCTCN2015083760-appb-000449
θ=4,θ!=24,ρj=ρ2。
Take collection
Figure PCTCN2015083760-appb-000449
θ=4, θ! =24, ρj=ρ2.
Figure PCTCN2015083760-appb-000450
Figure PCTCN2015083760-appb-000451
take
Figure PCTCN2015083760-appb-000450
then
Figure PCTCN2015083760-appb-000451
Figure PCTCN2015083760-appb-000452
Figure PCTCN2015083760-appb-000452
Figure PCTCN2015083760-appb-000453
then
Figure PCTCN2015083760-appb-000453
方案模式ρ2生成的最终单行向量的个数Ω[ρ2]=(公式如下):The number of final single-row vectors generated by the scheme mode ρ2 is Ω[ρ2]= (the formula is as follows):
Figure PCTCN2015083760-appb-000454
Figure PCTCN2015083760-appb-000454
当i1=1时:When i 1 =1:
Figure PCTCN2015083760-appb-000455
Figure PCTCN2015083760-appb-000455
当i1=2时:When i 1 = 2:
Figure PCTCN2015083760-appb-000456
Figure PCTCN2015083760-appb-000456
当i1=3时:When i 1 = 3:
Figure PCTCN2015083760-appb-000457
Figure PCTCN2015083760-appb-000457
当i1=4时:When i 1 = 4:
Figure PCTCN2015083760-appb-000458
Figure PCTCN2015083760-appb-000458
Figure PCTCN2015083760-appb-000459
Figure PCTCN2015083760-appb-000459
方案模式ρ2生成的最终单行向量的个数是:Ω[ρ2]=140。The number of final single-line vectors generated by the scheme mode ρ2 is: Ω[ρ2]=140.
本文方法的全部方案模式生成的最终单行向量的个数是:
Figure PCTCN2015083760-appb-000460
Figure PCTCN2015083760-appb-000461
The number of final single-line vectors generated by all the schema modes of this method is:
Figure PCTCN2015083760-appb-000460
Figure PCTCN2015083760-appb-000461
[B 3.8.5.7]第二组计算公式和相关证明[B 3.8.5.7] The second set of calculation formulas and related proofs
将原可能矩阵解转化为θ个找不到明确位置的新句法向量
Figure PCTCN2015083760-appb-000462
记为:
Figure PCTCN2015083760-appb-000463
将新生成的找不到明确位置的句法向量
Figure PCTCN2015083760-appb-000464
统称为第二类句法向量。将任意一个在前述的等量代换过程中被消去的句法向量称为前身句法向量。对于任意一个新生成的第二类句法向量
Figure PCTCN2015083760-appb-000465
将在前述的等量代换过程中被
Figure PCTCN2015083760-appb-000466
替换掉的前身句法向量f的个数记为uε。则
Figure PCTCN2015083760-appb-000467
是经过uε次等量代换获得的。
Convert the original possible matrix solution into θ new syntax vectors without finding a clear position
Figure PCTCN2015083760-appb-000462
Recorded as:
Figure PCTCN2015083760-appb-000463
Newly generated syntax vectors that cannot find an explicit position
Figure PCTCN2015083760-appb-000464
They are collectively referred to as the second type of syntax vector. Any syntactic vector that is eliminated in the aforementioned equal-substitution process is called a predecessor syntax vector. For any newly generated second type of syntax vector
Figure PCTCN2015083760-appb-000465
Will be in the aforementioned equal replacement process
Figure PCTCN2015083760-appb-000466
The number of the predecessor syntax vector f replaced is denoted as u ε . then
Figure PCTCN2015083760-appb-000467
It is obtained by sub-equivalent substitution of u ε .
例如:由向量f2=e+<f3+<7+<f5和f3=3+<e+<4+<e和f5=8+<e+<9+<10经过等量代换生成了一个第二类向量
Figure PCTCN2015083760-appb-000468
Figure PCTCN2015083760-appb-000469
则显然可知u1=2,即
Figure PCTCN2015083760-appb-000470
是经过2次等量代换得到的。
For example, by vector f 2 =e+<f 3 +<7+<f 5 and f 3 =3+<e+<4+<e and f 5 =8+<e+<9+<10 are generated by equal substitution a second class vector
Figure PCTCN2015083760-appb-000468
Figure PCTCN2015083760-appb-000469
It is obvious that u 1 = 2, ie
Figure PCTCN2015083760-appb-000470
It was obtained after two equal replacements.
定理1.3:任给一个第二类句法向量
Figure PCTCN2015083760-appb-000471
Figure PCTCN2015083760-appb-000472
中包含的句法元素的个数记为
Figure PCTCN2015083760-appb-000473
将句法向量
Figure PCTCN2015083760-appb-000474
消去的前身句法向量f的个数记为uε,则
Figure PCTCN2015083760-appb-000475
满足递推公式:
Theorem 1.3: Give a second type of syntax vector
Figure PCTCN2015083760-appb-000471
will
Figure PCTCN2015083760-appb-000472
The number of syntax elements included in the record is
Figure PCTCN2015083760-appb-000473
Syntactic vector
Figure PCTCN2015083760-appb-000474
The number of the predecessor syntax vector f that is eliminated is denoted as u ε , then
Figure PCTCN2015083760-appb-000475
Meet the recurrence formula:
Figure PCTCN2015083760-appb-000476
Figure PCTCN2015083760-appb-000476
证:运用数学归纳法证明如下:Certificate: The use of mathematical induction proves as follows:
(1),如果uε=0,则句法向量
Figure PCTCN2015083760-appb-000477
消去的前身句法向量f的个数是0,即
Figure PCTCN2015083760-appb-000478
是原可能矩阵解中的没有经过等量代换的句法向量,句法向量
Figure PCTCN2015083760-appb-000479
中包含的句法元素个数是4,显然此时公式
Figure PCTCN2015083760-appb-000480
成立。
(1), if u ε =0, the syntax vector
Figure PCTCN2015083760-appb-000477
The number of predecessor syntax vectors f that are eliminated is 0, that is,
Figure PCTCN2015083760-appb-000478
Is a syntactic vector in the original possible matrix solution without equal substitution, syntactic vector
Figure PCTCN2015083760-appb-000479
The number of syntactic elements contained in is 4, obviously the formula at this time
Figure PCTCN2015083760-appb-000480
Established.
(2),假设当uε=k时
Figure PCTCN2015083760-appb-000481
成立,此时
Figure PCTCN2015083760-appb-000482
消去的前身句法向量f的个数是k,
Figure PCTCN2015083760-appb-000483
中包含的元素个数是3k+4;当uε=k+1时,可以看作
Figure PCTCN2015083760-appb-000484
先消去k个前身句法向量f,然后在此基础之上,
Figure PCTCN2015083760-appb-000485
在减去自身一个句法元素的同时又引入了一个前身句法向量f,即
Figure PCTCN2015083760-appb-000486
在减去自身1个句法元素的同时又引入了4个元素。则此时
Figure PCTCN2015083760-appb-000487
中包含的元素个数是3k+4+3,则公式
Figure PCTCN2015083760-appb-000488
成立。综合(1)、(2),可知待证明结论成立。
(2), suppose when u ε = k
Figure PCTCN2015083760-appb-000481
Established, at this time
Figure PCTCN2015083760-appb-000482
The number of predecessor syntax vectors f that are eliminated is k,
Figure PCTCN2015083760-appb-000483
The number of elements contained in is 3k+4; when u ε =k+1, it can be regarded as
Figure PCTCN2015083760-appb-000484
First eliminate the k predecessor syntax vector f, and then based on this,
Figure PCTCN2015083760-appb-000485
After subtracting one of its syntactic elements, it introduces a predecessor syntax vector f, namely
Figure PCTCN2015083760-appb-000486
Four elements were introduced while subtracting one of their own syntax elements. Then at this time
Figure PCTCN2015083760-appb-000487
The number of elements contained in it is 3k+4+3, then the formula
Figure PCTCN2015083760-appb-000488
Established. Based on the comprehensive (1) and (2), it can be seen that the conclusion to be proved is established.
[B 3.8.5.8]关于本文方法的个数计算公式的总结[B 3.8.5.8] Summary of the formula for calculating the number of methods in this paper
结论1.3:在区分不同位置上的e的前提之下,将本文方法中的任意一个方案模式ρj对应的具体方案的个数与方案模式ρj生成的最终单行向量的个数相同,都记为Ω[ρj],则有:Ω[ρj]=(公式如下)(σ的定义,参见定义1.5)(θ≥2)Conclusion 1.3: Under the premise of distinguishing e at different positions, the number of specific schemes corresponding to any scheme pattern ρj in the method is the same as the number of final single-row vectors generated by the scheme pattern ρj, and is recorded as Ω. [ρj], then: Ω[ρj]=(Formula is as follows) (for the definition of σ, see definition 1.5) (θ≥2)
Figure PCTCN2015083760-appb-000489
Figure PCTCN2015083760-appb-000489
结论1.4:不妨定义dθ=σ[ρj(tθ)],则结论1.3的公式转化为:Ω[ρj]=(如下)(σ的定义,参见定义1.5)(θ≥2)Conclusion 1.4: It is possible to define d θ =σ[ρj(t θ )], and the formula of conclusion 1.3 is transformed into: Ω[ρj]=(as follows) (for the definition of σ, see definition 1.5) (θ≥2)
Figure PCTCN2015083760-appb-000490
Figure PCTCN2015083760-appb-000490
结论1.5:依据定理1.3,则有:Ω[ρj]=(如下)(σ的定义,参见定义2.5)(θ≥1)Conclusion 1.5: According to theorem 1.3, there are: Ω[ρj]=(below) (for the definition of σ, see definition 2.5) (θ≥1)
Figure PCTCN2015083760-appb-000491
Figure PCTCN2015083760-appb-000491
结论1.6:定义gθ=3uθ+4,则结论1.5的Ω[ρj]公式转化为:Ω[ρj]=(如下)(σ的定义,参见定义2.5)(θ≥2) Conclusion 1.6: Define g θ =3u θ +4, then the Ω[ρj] formula of conclusion 1.5 is transformed into: Ω[ρj]=(see below) (for definition of σ, see definition 2.5) (θ≥2)
Figure PCTCN2015083760-appb-000492
Figure PCTCN2015083760-appb-000492
结论1.7:因为在本文方法实施过程中总共产生θ!个方案模式,则在区分不同位置上的e的前提之下,本文方法中的全部方案模式生成的全部具体方案的个数与生成的全部最终单行向量的个数相同,个数公式是:
Figure PCTCN2015083760-appb-000493
(θ≥2)
Conclusion 1.7: Because a total of θ is generated during the implementation of the method in this paper! In the scheme mode, under the premise of distinguishing e at different positions, the number of all specific schemes generated by all scheme patterns in the method is the same as the number of all final single-row vectors generated, and the formula is:
Figure PCTCN2015083760-appb-000493
(θ≥2)
结论1.8:综合前述的每一条结论,在本文方法实施过程中生成有限个具体方案和有限个最终单行向量。具体方案的个数和最终单行向量的个数是确定的,有确切的计算公式和相应的证明,符合自然规律。Conclusion 1.8: Combining each of the above conclusions, a limited number of specific schemes and a limited number of final single-row vectors are generated during the implementation of the method herein. The number of specific schemes and the number of final single-row vectors are determined, and there are exact calculation formulas and corresponding proofs, which are in accordance with the laws of nature.
[B 3.8.5.9]第1种方法的全面演示[B 3.8.5.9] A comprehensive demonstration of the first method
举例说明:取如下可能矩阵解,并先将该可能矩阵解中找得到明确位置的句法向量进行等量代换。设:可能矩阵解,如下:For example: take the following possible matrix solution, and firstly replace the syntactic vector in the possible matrix solution to find the clear position. Let: the possible matrix solution is as follows:
Figure PCTCN2015083760-appb-000494
Figure PCTCN2015083760-appb-000494
原可能矩阵解,经过等量代换转化为: The original possible matrix solution is converted to:
Figure PCTCN2015083760-appb-000495
Figure PCTCN2015083760-appb-000495
集合
Figure PCTCN2015083760-appb-000496
列表如下:(集合
Figure PCTCN2015083760-appb-000497
θ=3,θ!=6)
set
Figure PCTCN2015083760-appb-000496
The list is as follows: (collection
Figure PCTCN2015083760-appb-000497
θ=3, θ! =6)
Figure PCTCN2015083760-appb-000498
Figure PCTCN2015083760-appb-000498
构造映射ρj如下:(θ=3,
Figure PCTCN2015083760-appb-000499
)
Construct the mapping ρj as follows: (θ=3,
Figure PCTCN2015083760-appb-000499
)
ρj:
Figure PCTCN2015083760-appb-000500
j∈N,1≤j≤6
Ρj:
Figure PCTCN2015083760-appb-000500
j∈N, 1≤j≤6
集合π={ρ1,ρ2,ρ3,ρ4,ρ5,ρ6}列表如下:The list of sets π={ρ1, ρ2, ρ3, ρ4, ρ5, ρ6} is as follows:
Figure PCTCN2015083760-appb-000501
Figure PCTCN2015083760-appb-000501
执行方案模式ρ1,要将递归算法
Figure PCTCN2015083760-appb-000502
运行2次。
Execution scheme mode ρ1, to recursive algorithm
Figure PCTCN2015083760-appb-000502
Run 2 times.
Figure PCTCN2015083760-appb-000503
运行如下:(n1∈N,i1∈N,1≤i1≤n1)
Figure PCTCN2015083760-appb-000503
The operation is as follows: (n 1 ∈N, i 1 ∈N, 1 ≤ i 1 ≤ n 1 )
Figure PCTCN2015083760-appb-000504
Figure PCTCN2015083760-appb-000504
Figure PCTCN2015083760-appb-000505
运行如下:(n2∈N,i2∈N,1≤i2≤n2)
Figure PCTCN2015083760-appb-000505
Operates as follows: (n 2 ∈N, i 2 ∈N, 1≤i 2 ≤n 2)
Figure PCTCN2015083760-appb-000506
Figure PCTCN2015083760-appb-000506
依据方案模式ρ1,
Figure PCTCN2015083760-appb-000507
列表:
According to the scheme mode ρ1,
Figure PCTCN2015083760-appb-000507
List:
Figure PCTCN2015083760-appb-000508
Figure PCTCN2015083760-appb-000508
依据方案模式ρ1,
Figure PCTCN2015083760-appb-000509
列表:
According to the scheme mode ρ1,
Figure PCTCN2015083760-appb-000509
List:
Figure PCTCN2015083760-appb-000510
Figure PCTCN2015083760-appb-000510
Figure PCTCN2015083760-appb-000511
Figure PCTCN2015083760-appb-000511
Figure PCTCN2015083760-appb-000512
Figure PCTCN2015083760-appb-000512
执行方案模式ρ2,要将递归算法
Figure PCTCN2015083760-appb-000513
运行2次。
Execution scheme mode ρ2, to recursive algorithm
Figure PCTCN2015083760-appb-000513
Run 2 times.
Figure PCTCN2015083760-appb-000514
运行如下:(n1∈N,i1∈N,1≤i1≤n1)
Figure PCTCN2015083760-appb-000514
The operation is as follows: (n 1 ∈N, i 1 ∈N, 1 ≤ i 1 ≤ n 1 )
Figure PCTCN2015083760-appb-000515
Figure PCTCN2015083760-appb-000515
Figure PCTCN2015083760-appb-000516
运行如下:(n2∈N,i2∈N,1≤i2≤n2)
Figure PCTCN2015083760-appb-000516
Operates as follows: (n 2 ∈N, i 2 ∈N, 1≤i 2 ≤n 2)
Figure PCTCN2015083760-appb-000517
Figure PCTCN2015083760-appb-000517
依据方案模式ρ2,
Figure PCTCN2015083760-appb-000518
列表:
According to the scheme mode ρ2,
Figure PCTCN2015083760-appb-000518
List:
Figure PCTCN2015083760-appb-000519
Figure PCTCN2015083760-appb-000519
依据方案模式ρ2,
Figure PCTCN2015083760-appb-000520
列表:
According to the scheme mode ρ2,
Figure PCTCN2015083760-appb-000520
List:
Figure PCTCN2015083760-appb-000521
Figure PCTCN2015083760-appb-000521
Figure PCTCN2015083760-appb-000522
Figure PCTCN2015083760-appb-000522
执行方案模式ρ2,要将递归算法
Figure PCTCN2015083760-appb-000523
运行2次。
Execution scheme mode ρ2, to recursive algorithm
Figure PCTCN2015083760-appb-000523
Run 2 times.
Figure PCTCN2015083760-appb-000524
运行如下:(n1∈N,i1∈N,1≤i1≤n1)
Figure PCTCN2015083760-appb-000524
The operation is as follows: (n 1 ∈N, i 1 ∈N, 1 ≤ i 1 ≤ n 1 )
Figure PCTCN2015083760-appb-000525
Figure PCTCN2015083760-appb-000525
Figure PCTCN2015083760-appb-000526
运行如下:(n2∈N,i2∈N,1≤i2≤n2)
Figure PCTCN2015083760-appb-000526
Operates as follows: (n 2 ∈N, i 2 ∈N, 1≤i 2 ≤n 2)
Figure PCTCN2015083760-appb-000527
Figure PCTCN2015083760-appb-000527
依据方案模式ρ3,
Figure PCTCN2015083760-appb-000528
列表:
According to the scheme mode ρ3,
Figure PCTCN2015083760-appb-000528
List:
Figure PCTCN2015083760-appb-000529
Figure PCTCN2015083760-appb-000529
依据方案模式ρ3,
Figure PCTCN2015083760-appb-000530
列表:
According to the scheme mode ρ3,
Figure PCTCN2015083760-appb-000530
List:
Figure PCTCN2015083760-appb-000531
Figure PCTCN2015083760-appb-000531
Figure PCTCN2015083760-appb-000532
Figure PCTCN2015083760-appb-000532
Figure PCTCN2015083760-appb-000533
Figure PCTCN2015083760-appb-000533
Figure PCTCN2015083760-appb-000534
Figure PCTCN2015083760-appb-000534
执行方案模式ρ4--ρ6的过程,略。The process of executing the scheme mode ρ4--ρ6 is omitted.
将插空递归算法
Figure PCTCN2015083760-appb-000535
的重要信息列表总结如下:
Plug-in recursive algorithm
Figure PCTCN2015083760-appb-000535
The list of important information is summarized as follows:
Figure PCTCN2015083760-appb-000536
Figure PCTCN2015083760-appb-000536
还可将上述的公式转化为:用前身句法向量的个数表示的形式。The above formula can also be converted into a form represented by the number of predecessor syntax vectors.
前身句法向量个数和第二类句法向量中元素的个数列表: A list of the number of elements in the syntactic vector and the number of elements in the second type of syntax vector:
Figure PCTCN2015083760-appb-000537
Figure PCTCN2015083760-appb-000537
用前身句法向量的个数表达插空递归算法的重要信息,列表如下:The important information of the interpolation recursive algorithm is expressed by the number of predecessor syntax vectors. The list is as follows:
Figure PCTCN2015083760-appb-000538
Figure PCTCN2015083760-appb-000538
检查每一个最终单行向量是否出现两个位置逆反的顺序值,略。Check each of the final single-line vectors for the order of the two positions reversed, omitted.
实例全面演示完毕。The example is fully demonstrated.
[B 3.8.6]对第2种插空方法的具体说明[B 3.8.6] Specific description of the second type of insertion method
下面详细介绍前述的单侧不保序整体插空方法。该方法能够精确地刻画出可能矩阵解中的找 不到明确位置的句法向量之间的有限次整体插空的每一种情况。The aforementioned single-sided unscheduled overall insertion method will be described in detail below. This method can accurately describe the search in the possible matrix solution. There is no finite number of global insertions between the syntactic vectors of the explicit position.
[B 3.8.6.1]构造插空递归函数[B 3.8.6.1] Constructing an empty recursive function
方案模式、具体方案、步骤的定义,参照第1种方法。下面是第2种方法与第1种方法的不同之处。构造一个插空递归算法
Figure PCTCN2015083760-appb-000539
通过该递归算法,就能刻画前述每一次单侧不保序整体插空的具体过程。在构造步骤递归算法之前,首先给出下列5个定义,作为预备知识:
Refer to the first method for the definition of the scheme model, specific schemes, and steps. The following is the difference between the second method and the first method. Constructing an emptying recursive algorithm
Figure PCTCN2015083760-appb-000539
Through the recursive algorithm, the specific process of the above-mentioned one-side unscheduled overall insertion can be described. Before constructing the step recursive algorithm, first give the following five definitions as preliminary knowledge:
下面要构造的插空递归算法
Figure PCTCN2015083760-appb-000540
是依据方案模式ρj执行的第k个步骤。其中的k表示插空递归算法
Figure PCTCN2015083760-appb-000541
运行的次数,即执行前述的单侧不保序整体插空操作的次数。
The emptying recursive algorithm to be constructed below
Figure PCTCN2015083760-appb-000540
It is the kth step performed according to the scheme mode ρj. Where k is the interpolation null recursive algorithm
Figure PCTCN2015083760-appb-000541
The number of runs, that is, the number of times the aforementioned one-sided unscheduled overall insertion operation is performed.
定义2.1:任给一个句法向量α,一元函数W表示取出并标记句法向量α。W(α)=αk表示取出句法向量α,并将句法向量α标记为αkDefinition 2.1: Give a syntax vector α, and the unary function W indicates that the syntax vector α is taken out and marked. W(α)=α k denotes taking out the syntax vector α and marking the syntax vector α as α k .
定义2.2:任给一个句法向量β,一元函数Q表示取出并标记句法向量β。Q(β)=βk表示取出句法向量β,并将句法向量β标记为βk。在运行插空递归算法
Figure PCTCN2015083760-appb-000542
过程中,要将句法向量βk插入句法向量αk中。
Definition 2.2: Give a syntax vector β, and the unary function Q denotes the extraction and marking of the syntax vector β. Q(β)=β k denotes taking out the syntax vector β and marking the syntax vector β as β k . Run the recursive algorithm
Figure PCTCN2015083760-appb-000542
In the process, the syntax vector β k is inserted into the syntax vector α k .
定义2.3:一元函数Z表示对句法向量αk标注顺序值,将句法向量αk中的从左数起第1个句法元素标注顺序值1,然后从左至右依次标注顺序值2,3,......,直至将句法向量αk中的句法元素全部标注完毕。将标注的最大顺序值记为nk。运行一元函数Z得到:
Figure PCTCN2015083760-appb-000543
2.3 definitions: Z represents a univariate function of the syntax of sequence annotation vector α k value, the syntax from the left vector α k in a syntax element from a first value of an order denoted, from left to right and then successively label value 2,3, ... until all the syntax elements in the syntax vector α k are marked. Record the maximum order value of the label as n k . Run the unary function Z to get:
Figure PCTCN2015083760-appb-000543
定义2.4:二元函数T表示在运用一元函数Z对句法向量αk标序之后,在向量αk上选取左数第mk个元素,并在第mk个元素的右侧构造唯一的空位,然后将句法向量βk以整体插空的方式插入该空位。将插空之后获得的新向量记为:
Figure PCTCN2015083760-appb-000544
运行二元函数T得到:
Figure PCTCN2015083760-appb-000545
Definition 2.4: The binary function T indicates that after applying the unary function Z to the syntactic vector α k , the m kth element of the left is selected on the vector α k and a unique gap is constructed on the right side of the m kth element. Then, the syntax vector β k is inserted into the slot in a globally inserted manner. Write the new vector obtained after inserting the empty space as:
Figure PCTCN2015083760-appb-000544
Run the binary function T to get:
Figure PCTCN2015083760-appb-000545
定义2.5:任给一个句法向量α,将α中包含的句法元素的个数记为σ[α]。如果句法向量α中 包含n个句法元素,n∈N,则显然有:n=σ[α]。Definition 2.5: Give a syntax vector α, and record the number of syntax elements contained in α as σ[α]. If the syntax vector α Containing n syntax elements, n∈N, obviously has: n=σ[α].
注:在下面的插空递归算法的定义中,会看到这样的等式:Note: In the definition of the emptying recursive algorithm below, you will see this equation:
Figure PCTCN2015083760-appb-000546
Figure PCTCN2015083760-appb-000546
其中的α和β是自变量的含义,是抽象记号,可以宽泛取值。因此,上述的记法并不矛盾。Among them, α and β are the meanings of independent variables, which are abstract marks and can be widely used. Therefore, the above notation is not contradictory.
下面,依据本文前述的映射ρj和4个函数,定义插空递归算法
Figure PCTCN2015083760-appb-000547
如下:(集合
Figure PCTCN2015083760-appb-000548
)
Next, according to the mapping ρj and the four functions described above, the interpolation null recursive algorithm is defined.
Figure PCTCN2015083760-appb-000547
As follows: (collection
Figure PCTCN2015083760-appb-000548
)
Figure PCTCN2015083760-appb-000549
Figure PCTCN2015083760-appb-000549
[B 3.8.6.2]具体方案和最终单行向量的个数计算公式[B 3.8.6.2] Formula for calculating the number of specific schemes and final single-line vectors
引理2.1:
Figure PCTCN2015083760-appb-000550
Lemma 2.1:
Figure PCTCN2015083760-appb-000550
证:依据前述的定义,在运行递归算法
Figure PCTCN2015083760-appb-000551
的过程中,造空向量αk和插空向量βk中的句法元素都没有增加或减少,即对于任意一个句法元素b:
Card: Run recursive algorithm in accordance with the aforementioned definition
Figure PCTCN2015083760-appb-000551
In the process, the syntactic elements in the null vector α k and the null vector β k are not increased or decreased, that is, for any one syntax element b:
①如果b∈αk或者b∈αk,则
Figure PCTCN2015083760-appb-000552
1 if b∈α k or b∈α k , then
Figure PCTCN2015083760-appb-000552
②如果
Figure PCTCN2015083760-appb-000553
并且
Figure PCTCN2015083760-appb-000554
Figure PCTCN2015083760-appb-000555
2 if
Figure PCTCN2015083760-appb-000553
and
Figure PCTCN2015083760-appb-000554
then
Figure PCTCN2015083760-appb-000555
依据①和②,显然有:
Figure PCTCN2015083760-appb-000556
According to 1 and 2, there are obviously:
Figure PCTCN2015083760-appb-000556
                                                证毕Certificate
引理2.2:设:(k∈N,k≥1)Lemma 2.2: Let: (k∈N, k≥1)
Figure PCTCN2015083760-appb-000557
Figure PCTCN2015083760-appb-000557
句法向量∑Ψk表示是由ρj(t1),ρj(t2),...,ρj(tk)依次经过单侧不保序整体插空而得到的句法向量。m1,m2,...,mk-1分别表示相应的向量的任意一个空位顺序数。则有如下结论成立:The syntax vector ∑Ψ k represents a syntactic vector obtained by ρj(t 1 ), ρj(t 2 ), ..., ρj(t k ) sequentially passing through a single-sided unpreserved global interpolation. m 1 , m 2 , ..., m k-1 respectively represent the number of any gap order of the corresponding vector. Then the following conclusions are established:
Figure PCTCN2015083760-appb-000558
Figure PCTCN2015083760-appb-000558
(注:
Figure PCTCN2015083760-appb-000559
)
(Note:
Figure PCTCN2015083760-appb-000559
)
证:运用数学归纳法证明如下:Certificate: The use of mathematical induction proves as follows:
(1),如果k=1,
Figure PCTCN2015083760-appb-000560
结论成立。
(1), if k=1,
Figure PCTCN2015083760-appb-000560
The conclusion was established.
(2),假设当k=h时成立,则有
Figure PCTCN2015083760-appb-000561
(2), assuming that when k = h is established, there is
Figure PCTCN2015083760-appb-000561
当k=h+1时,
Figure PCTCN2015083760-appb-000562
则有:
When k=h+1,
Figure PCTCN2015083760-appb-000562
Then there are:
Figure PCTCN2015083760-appb-000563
Figure PCTCN2015083760-appb-000563
依据引理2.1,则可得:According to Lemma 2.1, you can get:
Figure PCTCN2015083760-appb-000564
Figure PCTCN2015083760-appb-000564
依据归纳假设可得:According to the induction hypothesis:
Figure PCTCN2015083760-appb-000565
Figure PCTCN2015083760-appb-000565
从而可得:
Figure PCTCN2015083760-appb-000566
Thus available:
Figure PCTCN2015083760-appb-000566
又可得:
Figure PCTCN2015083760-appb-000567
Also available:
Figure PCTCN2015083760-appb-000567
                                                               证毕Certificate
定理2.1:将本文方法中的任意一个方案模式ρj生成的具体方案的个数记为Ω[ρj],则有:
Figure PCTCN2015083760-appb-000568
Theorem 2.1: The number of specific schemes generated by any one of the scheme modes ρj in this method is recorded as Ω[ρj], then:
Figure PCTCN2015083760-appb-000568
证:依据插空递归算法的定义,对于任意的一个方案模式ρj,第k个步骤的插空情况的个数与第k个步骤的造空向量αk的句法元素的个数相同。依据前述的定义,显然αk=∑Ψk,则依据引理2.2,方案模式ρj的第k个步骤有
Figure PCTCN2015083760-appb-000570
个情况。由于任意一个方案模式ρj都下辖(θ-1)个步骤,依据组合数学的乘法原理可知:本文方法的任意一个方案模式ρj都对应
Figure PCTCN2015083760-appb-000571
个具体方案。
Card: According to the insertion recursion algorithm The definition, for any one of the scheme modes ρj, the number of insertions of the kth step is the same as the number of syntax elements of the nullation vector α k of the kth step. According to the foregoing definition, it is obvious that α k = ∑Ψ k , according to Lemma 2.2, the kth step of the scheme mode ρj has
Figure PCTCN2015083760-appb-000570
a situation. Since any scheme mode ρj has jurisdiction (θ-1) steps, according to the multiplication principle of combinatorial mathematics, it can be known that any scheme mode ρj of the method corresponds to
Figure PCTCN2015083760-appb-000571
Specific programs.
                                                              证毕Certificate
定理2.2:将本文方法中的任意一个方案模式ρj生成的最终单行向量的个数记为Ω[ρj],则有:
Figure PCTCN2015083760-appb-000572
Theorem 2.2: The number of final single-row vectors generated by any one of the scheme modes ρj in this method is recorded as Ω[ρj], then:
Figure PCTCN2015083760-appb-000572
证:依据最终单行向量和具体方案的定义,最终单行向量的个数和具体方案的个数相等;又依据定理2.1,可得。 Proof: According to the definition of the final single-line vector and the specific scheme, the number of final single-row vectors is equal to the number of specific schemes; and according to Theorem 2.1, it is available.
结论2.1:本文方法总共有θ!个方案模式,根据定理2.1,又依据组合数学的加法原理,可知具体方案的总数是:
Figure PCTCN2015083760-appb-000573
Conclusion 2.1: This method has a total of θ! According to the theorem 2.1, and according to the additive principle of combinatorial mathematics, the total number of specific schemes is:
Figure PCTCN2015083760-appb-000573
结论2.2:本文方法总共有θ!个方案模式,根据定理2.2,又依据组合数学的加法原理,可知最终单行向量的总数是:
Figure PCTCN2015083760-appb-000574
Conclusion 2.2: This method has a total of θ! According to theorem 2.2, and according to the additive principle of combinatorial mathematics, the total number of final single-line vectors is:
Figure PCTCN2015083760-appb-000574
[B 3.8.6.3]第2种方法的实例演示[B 3.8.6.3] Example demonstration of the second method
举例说明:取如下可能矩阵解,并先将该可能矩阵解中找得到明确位置的句法向量进行等量代换。设:可能矩阵解,如下:For example: take the following possible matrix solution, and firstly replace the syntactic vector in the possible matrix solution to find the clear position. Let: the possible matrix solution is as follows:
Figure PCTCN2015083760-appb-000575
Figure PCTCN2015083760-appb-000575
原可能矩阵解,经过等量代换转化为:The original possible matrix solution is converted to:
Figure PCTCN2015083760-appb-000576
Figure PCTCN2015083760-appb-000576
集合
Figure PCTCN2015083760-appb-000577
列表如下:集合
Figure PCTCN2015083760-appb-000578
θ=3,θ!=6
set
Figure PCTCN2015083760-appb-000577
The list is as follows: collection
Figure PCTCN2015083760-appb-000578
θ=3, θ! =6
Figure PCTCN2015083760-appb-000579
Figure PCTCN2015083760-appb-000579
构造映射ρj如下:(θ=3,
Figure PCTCN2015083760-appb-000580
)
Construct the mapping ρj as follows: (θ=3,
Figure PCTCN2015083760-appb-000580
)
ρj:
Figure PCTCN2015083760-appb-000581
j∈N,1≤j≤6
Ρj:
Figure PCTCN2015083760-appb-000581
j∈N, 1≤j≤6
集合π={ρ1,ρ2,ρ3,ρ4,ρ5,ρ6}列表如下:The list of sets π={ρ1, ρ2, ρ3, ρ4, ρ5, ρ6} is as follows:
Figure PCTCN2015083760-appb-000582
Figure PCTCN2015083760-appb-000582
执行方案模式ρ1,要将步骤递归算法
Figure PCTCN2015083760-appb-000583
运行2次。
Execution scheme mode ρ1, step recursive algorithm
Figure PCTCN2015083760-appb-000583
Run 2 times.
Figure PCTCN2015083760-appb-000584
运行如下:
Figure PCTCN2015083760-appb-000584
Run as follows:
Figure PCTCN2015083760-appb-000585
Figure PCTCN2015083760-appb-000585
Figure PCTCN2015083760-appb-000586
运行如下:
Figure PCTCN2015083760-appb-000586
Run as follows:
Figure PCTCN2015083760-appb-000587
Figure PCTCN2015083760-appb-000587
依据方案模式ρ1,运行插空函数
Figure PCTCN2015083760-appb-000588
列表:
Run the insertion function according to the scheme mode ρ1
Figure PCTCN2015083760-appb-000588
List:
Figure PCTCN2015083760-appb-000589
Figure PCTCN2015083760-appb-000589
依据方案模式ρ1,运行插空函数
Figure PCTCN2015083760-appb-000590
列表:
Run the insertion function according to the scheme mode ρ1
Figure PCTCN2015083760-appb-000590
List:
Figure PCTCN2015083760-appb-000591
Figure PCTCN2015083760-appb-000591
Figure PCTCN2015083760-appb-000592
Figure PCTCN2015083760-appb-000592
Figure PCTCN2015083760-appb-000593
Figure PCTCN2015083760-appb-000593
执行方案模式ρ2--ρ6的过程,略。The process of executing the scheme mode ρ2--ρ6 is omitted.
将插空递归算法
Figure PCTCN2015083760-appb-000594
的重要信息列表总结如下:
Plug-in recursive algorithm
Figure PCTCN2015083760-appb-000594
The list of important information is summarized as follows:
Figure PCTCN2015083760-appb-000595
Figure PCTCN2015083760-appb-000595
Figure PCTCN2015083760-appb-000596
Figure PCTCN2015083760-appb-000596
将两个或两个以上完全相同的最终单行向量保留一个,删除多余的雷同的最终单行向量,最后得到210个两两不相同的最终单行向量,与方法1的结果完全吻合。One or more identical final single-row vectors are reserved for one, and the remaining identical single-row vectors are deleted, and finally 210 consecutive single-row vectors that are different from each other are completely consistent with the results of Method 1.
检查每一个最终单行向量是否出现两个位置逆反的顺序值,略。Check each of the final single-line vectors for the order of the two positions reversed, omitted.
实例演示完毕。The example is complete.
C部分应用举例Part C application example
C1部分例1C1 part example 1
例1:通过预处理,可以除去语句中的杂质,并标注和识别语句中的词单元编号和类型。例如,对于英语语句S=“I can completely understand what what you just said really meant”,其除去杂质后得到的语句S=“I can understand what what you said meant”,在对其进行词单元识别以及词单元类型标注和编号后,可以得到与下表匹配的数据结构。 Example 1: By preprocessing, you can remove the impurities in the statement and label and identify the word unit number and type in the statement. For example, for the English sentence S=“I can completely understand what what you just said really meant”, the sentence S=“I can understand what what you said meant” after removing the impurities, in which the word unit is recognized and the word After the unit type is labeled and numbered, you can get the data structure that matches the table below.
语句Statement 词单元类型Word unit type 编号Numbering
II 名词代词单元Noun pronoun unit 11
can understandCan understand 谓语动词单元Predicate verb unit 22
what AWhat A 从属关联词单元Subordinate unit 33
what BWhat B 从属关联词单元Subordinate unit 44
youYou 名词代词单元Noun pronoun unit 55
saidSaid 谓语动词单元Predicate verb unit 66
meantMe 谓语动词单元Predicate verb unit 77
本发明基于对以上数据结构所表示的经预处理的语句进行句法分析,以获得各词单元在句子中的成分关系。The present invention is based on syntactic analysis of the pre-processed statements represented by the above data structures to obtain the component relationships of the various word units in the sentences.
图1是本发明实施例的基于计算机的自然语言句法结构解析的方法的流程图。如图1所示,所述方法包括:1 is a flow chart of a method for parsing a computer-based natural language syntax structure according to an embodiment of the present invention. As shown in FIG. 1, the method includes:
步骤110、读取待解析的经预处理的语句数据结构,所述经预处理的语句数据结构中仅包括语句的关联词单元、谓语动词单元和名词代词单元,且各词单元按照在所述经预处理的语句中的顺序进行编号并标注类型。Step 110: Read a pre-processed statement data structure to be parsed, where the pre-processed statement data structure includes only a related word unit, a predicate verb unit, and a noun pronoun unit of the sentence, and each word unit is in accordance with the The order in the preprocessed statement is numbered and labeled.
步骤120、对每一谓语动词单元,生成对应的引导语元素、主语元素、谓语元素和宾语元素;所述引导语元素的可能取值为编号小于对应的谓语动词单元编号的并列关联词单元或从属关联词单元之一,或由一个编号小于对应的谓语动词单元编号的并列关联词单元和一个与其相邻且编号小于对应的谓语动词单元编号且编号大于该并列关联词单元编号的从属关联词单元构成的关联词组合向量之一,或空单元;Step 120: Generate, for each predicate verb unit, a corresponding guide element, a subject element, a predicate element, and an object element; the possible value of the guide element is a parallel related word unit or subordinate whose number is smaller than the corresponding predicate verb unit number. One of the related word units, or a parallel related word unit whose number is smaller than the corresponding predicate verb unit number and a related word combination composed of a dependent related word unit whose number is smaller than the corresponding predicate verb unit number and whose number is greater than the parallel related word unit number One of the vectors, or an empty unit;
所述主语元素的可能取值为编号小于对应的谓语动词单元编号的名词代词单元之一,或最大词单元的编号小于对应的谓语动词单元编号的全体并列名词代词组合向量族中所包含的并列名词代词组合向量之一,或在前出现的谓语元素对应的句法向量之一,或空单元;The possible value of the subject element is one of the noun pronoun units whose number is smaller than the corresponding predicate verb unit number, or the number of the largest word unit is smaller than the juxtaposition included in the total parallel noun pronoun combination vector family of the corresponding predicate verb unit number. One of the noun pronoun combination vectors, or one of the syntactic vectors corresponding to the predicate element, or an empty unit;
所述谓语元素为对应的所述谓语动词单元;The predicate element is a corresponding predicate verb unit;
所述宾语元素的可能取值为编号大于对应的谓语动词单元编号且小于相邻的在后出现的谓语动词单元编号的名词代词单元之一,或最小词单元的编号大于对应的谓语动词单元编号且小于相邻的在后出现的谓语动词单元编号的全体并列名词代词组合向量族中所包含的并列名词代词组合向量之一,或在后出现的谓语元素对应的句法向量之一,或空单元。The possible value of the object element is one of the noun pronoun units whose number is greater than the corresponding predicate verb unit number and less than the adjacent predicate verb unit number, or the number of the smallest word unit is greater than the corresponding predicate verb unit number. And one of the parallel noun pronoun combination vectors included in the entire parallel noun pronoun combination vector family of adjacent predicate verb unit numbers, or one of the syntactic vectors corresponding to the predicate element, or an empty unit .
具体地,对于经预处理的语句,设其谓语动词单元总数量为n,由于谓语动词单元仅能作为谓语,因此,每个谓语动词单元均对应一个谓语元素,记每个谓语动词单元为rk,k=1,…,n。 Specifically, for the preprocessed statement, the total number of predicate verb units is n, and since the predicate verb unit can only be used as a predicate, each predicate verb unit corresponds to one predicate element, and each predicate verb unit is r k , k = 1, ..., n.
在获得谓语元素后,继续基于每个谓语元素的位置编号生成对应的引导语元素、主语元素、宾语元素。After obtaining the predicate element, the corresponding guide element, subject element, and object element are generated based on the position number of each predicate element.
I、引导语元素I, guide element
记每个谓语动词单元rk对应的关联词单元集合为:The set of related word units corresponding to each predicate verb unit r k is:
{xk}=Leadk∪conjk∪(conjk o Leadk)∪{e}{x k }=Lead k ∪conj k ∪(conj k o Lead k )∪{e}
记谓语动词单元rk对应的引导语元素为xk,其可能取值集合为{xk}。生成谓语动词单元rk对应的引导语元素为xk的可能取值集合(优选)包括:The leader element corresponding to the verb unit r k is x k , and its possible value set is {x k }. Generating a possible set of values (preferred) in which the leader element corresponding to the predicate verb unit r k is x k includes:
所述引导语元素的可能取值为编号小于对应的谓语动词单元编号的并列关联词单元或从属关联词单元之一,或由一个编号小于对应的谓语动词单元编号的并列关联词单元和一个与其相邻且编号小于对应的谓语动词单元编号且编号大于该并列关联词单元编号的从属关联词单元构成的关联词组合向量之一,或空单元。The possible value of the guide element is one of a parallel related word unit or a dependent related word unit whose number is smaller than the corresponding predicate verb unit number, or a parallel related word unit whose number is smaller than the corresponding predicate verb unit number and one adjacent thereto One of the associated word combination vectors consisting of the dependent term unit numbers whose number is smaller than the corresponding predicate verb unit number and whose number is greater than the parallel related word unit number, or an empty unit.
即,xk∈Leadk∪conjk∪(conjk o Leadk)∪{e}。That is, x k ∈Lead k ∪conj k ∪(conj k o Lead k )∪{e}.
在上述的公式中,集合Leadk表示编号小于对应的谓语动词单元编号的从属关联词单元集合;conjk表示编号小于对应的谓语动词单元编号的并列关联词单元集合;(conjk o Leadk)表示由一个编号小于对应的谓语动词单元编号的并列关联词单元和一个与其相邻且编号小于对应的谓语动词单元编号且编号大于该并列关联词单元编号的从属关联词单元构成的关联词组合向量集合,e表示空单元。In the above formula, the set Lead k represents a set of dependent related word units whose number is smaller than the corresponding predicate verb unit number; conj k represents a set of parallel related word units whose number is smaller than the corresponding predicate verb unit number; (conj k o Lead k ) represents a parallel related word unit whose number is smaller than the corresponding predicate verb unit number and a related word combination vector set formed by the dependent related word unit whose number is smaller than the corresponding predicate verb unit number and whose number is greater than the parallel related word unit number, and e represents an empty unit .
例如,对于上述表1所示的经预处理的语句S=“I can understand what what you said meant”,有:For example, for the pre-processed statement S=“I can understand what what you said meant” shown in Table 1 above, there are:
r1=“can understand”,对于r1有{x1}={e},也即,与r1对应的引导语元素的取值为空单元。r 1 = "can understand", for r 1 there is {x 1 }={e}, that is, the value of the leader element corresponding to r 1 is an empty cell.
r2=“said”,对于r2有{x2}={what A,what B,e},与r2对应的引导语元素的可取值为句中第一个what或第二个what,即,“what A”和“what B”之一,或空单元。r 2 = "said", for r 2 there is {x 2 }={what A, what B, e}, and the value of the leader element corresponding to r 2 is the first what or the second in the sentence. , that is, one of "what A" and "what B", or an empty unit.
r3=“meant”,对于r3有{x3}={what A,what B,e},与r3对应的引导语元素的可取值为句中第一个what或第二个what,即,“what A”和“what B”之一,或空单元。r 3 = "meant", for r 3 there is {x 3 }={what A, what B, e}, and the value of the leader element corresponding to r 3 is the first what or the second in the sentence. , that is, one of "what A" and "what B", or an empty unit.
II、主语元素II, subject elements
记每个谓语动词单元rk对应的主语名词代词单元集合为{yk}=NPIyk∪VNPyk∪NOMPk∪Gk∪{e}或{yk}=NPIyk∪VNPyk∪NOMPk∪Gk∪fyk∪{e}。The set of subject noun pronouns corresponding to each predicate verb unit r k is {y k }=NPI yk ∪VNP yk ∪NOMP k ∪G k ∪{e} or {y k }=NPI yk ∪VNP yk ∪NOMP k ∪G k ∪fy k ∪{e}.
记谓语动词单元rk对应的主语元素为yk,其可能取值集合为{yk}。 The subject element corresponding to the verb unit r k is y k , and its possible value set is {y k }.
生成对应的主语元素yk优选包括:Generating the corresponding subject element y k preferably includes:
(1)当对应的谓语动词单元编号是最小的谓语动词单元编号时,所述主语元素的可能取值为编号小于对应的谓语动词单元编号的名词代词单元之一,或其最大词单元的编号小于对应的谓语动词单元编号的全体并列名词代词组合向量族中所包含的并列名词代词组合向量之一,或空单元。(1) When the corresponding predicate verb unit number is the smallest predicate verb unit number, the possible value of the subject element is one of the noun pronoun units whose number is smaller than the corresponding predicate verb unit number, or the number of its largest word unit One of the parallel noun pronoun combination vectors included in the vector of the total parallel noun pronoun combination vector of the corresponding predicate verb unit number, or an empty unit.
也即,当不存在rk-1时:{yk}=NPIyk∪VNPyk∪NOMPk∪Gk∪{e};That is, when r k-1 does not exist: {y k }=NPI yk ∪VNP yk ∪NOMP k ∪G k ∪{e};
从而,yk∈NPIyk∪VNPyk∪NOMPk∪Gk∪{e}。 Whereby, y k ∈NPI yk ∪VNP yk ∪NOMP k ∪G k ∪ {e}.
在上述的公式中,集合NPIyk表示编号小于对应的谓语动词单元编号的纯粹名词单元集合;VNPyk表示编号小于对应的谓语动词单元编号的名词性质的动词单元集合;NOMPk编号小于对应的谓语动词单元编号的主格代词单元集合;Gk表示由最大词单元的编号小于对应的谓语动词单元编号的全体并列名词代词组合向量族组成的并集合;e表示空单元。In the above formula, the set NPI yk represents a pure noun unit set whose number is smaller than the corresponding predicate verb unit number; VNP yk represents a verb unit set whose number is smaller than the noun nature of the corresponding predicate verb unit number; the NOMP k number is smaller than the corresponding predicate The set of the main lattice pronoun units of the verb unit number; G k represents a union of the total number of parallel unit noun pronoun combination vector numbers whose number of the largest word unit is smaller than the corresponding predicate verb unit number; e represents an empty unit.
(2)当对应的谓语动词单元编号不是最小的谓语动词单元编号时,所述主语元素的可能取值为编号小于对应的谓语动词单元编号的名词代词单元之一,或最大词单元的编号小于对应的谓语动词单元编号的全体并列名词代词组合向量族中所包含的并列名词代词组合向量之一,或在前出现的谓语动词单元对应的句法向量之一,或空单元。(2) When the corresponding predicate verb unit number is not the smallest predicate verb unit number, the possible value of the subject element is one of the noun pronoun units whose number is smaller than the corresponding predicate verb unit number, or the number of the largest word unit is smaller than The corresponding predicate verb unit number is one of the parallel noun pronoun combination vectors included in the common noun pronoun combination vector family, or one of the syntactic vectors corresponding to the predicate verb unit, or an empty unit.
也即,当存在rk-1时:{yk}=NPIyk∪VNPyk∪NOMPk∪Gk∪fyk∪{e}。That is, when r k-1 is present: {y k }=NPI yk ∪VNP yk ∪NOMP k ∪G k ∪fy k ∪{e}.
从而,yk∈NPIyk∪VNPyk∪NOMPk∪Gk∪fyk∪{e}。 Whereby, y k ∈NPI yk ∪VNP yk ∪NOMP k ∪G k ∪fy k ∪ {e}.
在上述的公式中,集合NPIyk表示编号小于对应的谓语动词单元编号的纯粹名词单元集合;VNPyk表示编号小于对应的谓语动词单元编号的名词性质的动词单元集合;NOMPk编号小于对应的谓语动词单元编号的主格代词单元集合;Gk表示由最大词单元的编号小于对应的谓语动词单元编号的全体并列名词代词组合向量族组成的并集合;fyk表示在前出现的谓语动词单元对应的句法向量集合;e表示空单元。In the above formula, the set NPI yk represents a pure noun unit set whose number is smaller than the corresponding predicate verb unit number; VNP yk represents a verb unit set whose number is smaller than the noun nature of the corresponding predicate verb unit number; the NOMP k number is smaller than the corresponding predicate a set of primary lattice pronoun units of the verb unit number; G k represents a union of the total number of parallel unit noun pronouns combined by the number of largest word units; fy k represents the predicate verb unit corresponding to the preceding A set of syntax vectors; e represents an empty cell.
例如,对于上述表1所示的经预处理的语句S=“I can understand what what you said meant”,有:For example, for the pre-processed statement S=“I can understand what what you said meant” shown in Table 1 above, there are:
r1=“can understand”,对于r1有其为编号最小的谓语动词单元,因此,{y1}=NOMP1∪{e}={I,e}。r 1 = "can understand", which has the lowest numbered predicate verb unit for r 1 , therefore, {y 1 }=NOMP 1 ∪{e}={I,e}.
r2=“said”,对于r2有其并非编号最小的谓语动词单元,在r1和r2之间的名词代词单元仅有“you”,而编号小于2的函数为f1,因此,{y2}=NOMP2∪fy2∪{e}={I,you}∪{f1}∪{e}。r 2 = "said", for r 2 there is a predicate verb unit that is not the lowest number, the noun pronoun unit between r 1 and r 2 has only "you", and the function with number less than 2 is f 1 , therefore, {y 2 }=NOMP 2 ∪fy 2 ∪{e}={I,you}∪{f 1 }∪{e}.
r3=“meant”,对于r3其并非编号最小的谓语动词单元,在r2和r3之间没有名词代词单元,而编号小于3的函数为f1和f2,因此,有:{y3}=NOMP3∪fy3∪{e}={I,you}∪{f1,f2}∪{e}。r 3 = "meant", for r 3 which is not the lowest numbered predicate verb unit, there is no noun pronoun unit between r 2 and r 3 , and the function with number less than 3 is f 1 and f 2 , therefore, there are: { y 3 }=NOMP 3 ∪fy 3 ∪{e}={I,you}∪{f 1 ,f 2 }∪{e}.
III、宾语元素 III. Object elements
记每个谓语动词单元rk对应的宾语名词代词单元集合为{zk}=NPIzk∪VNPzk∪OBJPk∪Hk∪{e}或{zk}=NPIzk∪VNPzk∪OBJPk∪Hk∪fzk∪{e}。The set of object noun pronouns corresponding to each predicate verb unit r k is {z k }=NPI zk ∪VNP zk ∪OBJP k ∪H k ∪{e} or {z k }=NPI zk ∪VNP zk ∪OBJP k ∪H k ∪fz k ∪{e}.
同时,记谓语动词单元rk对应的引导语元素为zk,其可能取值集合为{zk}。At the same time, the leader element corresponding to the predicate verb unit r k is z k , and its possible value set is {z k }.
生成对应的宾语元素{zk}优选包括:Generating the corresponding object element {z k } preferably includes:
(1)当对应的谓语动词单元编号是最大的谓语动词单元编号时,所述宾语元素的可能取值为编号大于对应的谓语动词单元编号的名词代词单元之一,或其最小词单元的编号大于对应的谓语动词单元编号的全体并列名词代词组合向量族中所包含的并列名词代词组合向量之一,或空单元。(1) When the corresponding predicate verb unit number is the largest predicate verb unit number, the possible value of the object element is one of the noun pronoun units whose number is greater than the corresponding predicate verb unit number, or the number of its smallest word unit One of the parallel noun pronoun combination vectors, or an empty unit, contained in the vector of the entire parallel noun pronoun combination vector of the corresponding predicate verb unit number.
也即,当不存在rk+1时:{zk}=NPIzk∪VNPzk∪OBJPk∪Hk∪{e}。That is, when r k+1 does not exist: {z k }=NPI zk ∪VNP zk ∪OBJP k ∪H k ∪{e}.
在上述的公式中,集合NPIzk表示编号大于对应的谓语动词单元编号的纯粹名词单元集合;VNPzk表示编号大于对应的谓语动词单元编号的名词性质的动词单元集合;OBJPk表示编号大于对应的谓语动词单元编号的名词性质的宾格代词单元集合;Hk表示由最小词单元的编号大于对应的谓语动词单元编号的全体并列名词代词组合向量族组成的并集合;e表示空单元。In the above formula, the set NPI zk represents a set of pure noun units whose number is greater than the corresponding predicate verb unit number; VNP zk represents a set of verb units whose number is greater than the noun nature of the corresponding predicate verb unit number; OBJP k indicates that the number is greater than the corresponding a set of binge pronoun units of the noun nature of the predicate verb unit number; H k represents a union of the total number of parallel lexical pronoun combination vectors of the smallest word unit number greater than the corresponding predicate verb unit number; e represents an empty unit.
(2)当对应的谓语动词单元编号不是最大的谓语动词单元编号时,所述宾语元素的可能取值为编号大于对应的谓语动词单元编号且小于相邻的在后出现的谓语动词单元编号的名词代词单元之一,或其最小词单元的编号大于对应的谓语动词单元编号且小于相邻的在后出现的谓语动词单元编号的全体并列名词代词组合向量族中所包含的并列名词代词组合向量之一,或在后出现的谓语动词单元对应的句法向量之一,或空单元。(2) When the corresponding predicate verb unit number is not the largest predicate verb unit number, the possible value of the object element is greater than the corresponding predicate verb unit number and less than the adjacent predicate verb unit number. One of the noun pronoun units, or the number of its smallest word unit is greater than the corresponding predicate verb unit number and less than the adjacent verb noun pronoun combination vector family of the collateral noun pronoun combination vector number of the adjacent predicate verb unit number One, or one of the syntactic vectors corresponding to the predicate verb unit that appears later, or an empty cell.
也即,当存在rk+1时:{zk}=NPIzk∪VNPzk∪OBJPk∪Hk∪fzk∪{e}。That is, when r k+1 is present: {z k }=NPI zk ∪VNP zk ∪OBJP k ∪H k ∪fz k ∪{e}.
在上述的公式中,集合NPIzk表示编号大于对应的谓语动词单元编号且小于相邻的在后出现的谓语动词单元编号的的纯粹名词单元集合;VNPzk表示编号大于对应的谓语动词单元编号且小于相邻的在后出现的谓语动词单元编号的名词性质的动词单元集合;OBJPk表示大于对应的谓语动词单元编号且小于相邻的在后出现的谓语动词单元编号的宾格代词单元集合;Hk表示由最小词单元的编号大于对应的谓语动词单元编号且小于相邻的在后出现的谓语动词单元编号的全体并列名词代词组合向量族组成的并集合;fzk表示在后出现的谓语动词单元对应的句法向量集合;e表示空单元。In the above formula, the set NPI zk represents a pure noun unit set whose number is greater than the corresponding predicate verb unit number and less than the adjacent predicate verb unit number; VNP zk indicates that the number is greater than the corresponding predicate verb unit number and a set of verb units smaller than the adjacent nouns of the predicate verb unit number; OBJP k represents a set of binge pronoun units larger than the corresponding predicate verb unit number and smaller than the adjacent predicate verb unit number; H k represents a union of the total number of parallel word nouns combined with the corresponding predicate verb unit number and less than the adjacent predicate verb unit number; the fz k represents the predicate that appears later The set of syntax vectors corresponding to the verb unit; e represents the empty unit.
例如,对于上述表1所示的经预处理的语句S=“I can understand what what you said meant”,有:For example, for the pre-processed statement S=“I can understand what what you said meant” shown in Table 1 above, there are:
r1=“can understand”,对于r1有其不是编号最大的谓语动词单元,在r1和r2之间存在名词代词单元“you”,没有并列名词代词组合向量,而编号大于1的函数为f2,f3,因此,{z1}=OBJP1∪fz1∪{e}={you}∪{f2,f3}∪{e}。 r 1 = “can understand”, for r 1 there is a predicate verb unit that is not the largest number, there is a noun pronoun unit “you” between r 1 and r 2 , no parallel noun pronoun combination vector, and a function with a number greater than 1 Is f 2 , f 3 , therefore, {z 1 }=OBJP 1 ∪fz 1 ∪{e}={you}∪{f 2 ,f 3 }∪{e}.
r2=“said”,对于r2其并非编号最大的谓语动词单元,在r1和r2之间没有名词代词单元,而编号大于2的函数为f3,也没有并列名词代词组合向量,因此,有:{z2}=fz2∪{e}={f3}∪{e}。r 2 = "said", for r 2 which is not the highest numbered predicate verb unit, there is no noun pronoun unit between r 1 and r 2 , and the function with number greater than 2 is f 3 , and there is no parallel noun pronoun combination vector, Therefore, there are: {z 2 }=fz 2 ∪{e}={f 3 }∪{e}.
r3=“meant”,对于r3其为编号最大的谓语动词单元,在r3之后没有名词代词单元,也没有并列名词代词组合向量,而编号大于3的函数也不存在,因此,{z3}={e}。r 3 = "meant", r 3, which is for the maximum number of units verb, noun r 3 after no pronouns unit, nor parallel pronouns term combination vector, and the number greater than 3, the function does not exist, therefore, {z 3 }={e}.
由此,经由步骤120处理,对于上述示例,可以生成得到各元素的取值集合。Thus, through the processing in step 120, for the above example, a set of values for each element can be generated.
步骤130、根据所述引导语元素、主语元素、谓语元素、宾语元素的可能取值,获取每一谓语动词单元对应的句法向量的所有可能取值,所述句法向量包括引导语元素、主语元素、谓语元素、宾语元素。Step 130: Obtain all possible values of a syntax vector corresponding to each predicate verb unit according to possible values of the guide element, the subject element, the predicate element, and the object element, where the syntax vector includes a guide element and a subject element. , predicate elements, object elements.
如前所述,每一个主谓搭配结构可以用句法向量的方式来表示。根据步骤120的运行结果,对于上述表1所示的经预处理的语句S=“I can understand what what you said meant”,有:As mentioned earlier, each subject-predicate collocation structure can be represented by a syntactic vector. According to the operation result of step 120, for the preprocessed statement S=“I can understand what what you said meant” shown in Table 1 above, there are:
{r1}={can understand}{r 1 }={can understand}
{x1}={e}{x 1 }={e}
{y1}={I,e}{y 1 }={I,e}
{z1}={you,f2,f3,e}{z 1 }={you,f 2 ,f 3 ,e}
运用组合数学中的乘法原理:f1(x1,y1,r1,z1)=(见下方列表)Apply the principle of multiplication in combinatorial mathematics: f 1 (x 1 , y 1 , r 1 , z 1 )= (see list below)
序号Serial number 行矩阵f1 Row matrix f 1 序号Serial number 行矩阵f1 Row matrix f 1
(1-1)(1-1) f1=(e,I,r1,you)f 1 = (e, I, r 1 , you) (1-5)(1-5) f1=(e,e,r1,you)f 1 =(e,e,r 1 ,you)
(1-2)(1-2) f1=(e,I,r1,f2)f 1 = (e, I, r 1 , f 2 ) (1-6)(1-6) f1=(e,e,r1,f2)f 1 = (e, e, r 1 , f 2 )
(1-3)(1-3) f1=(e,I,r1,f3)f 1 = (e, I, r 1 , f 3 ) (1-7)(1-7) f1=(e,e,r1,f3)f 1 = (e, e, r 1 , f 3 )
(1-4)(1-4) f1=(e,I,r1,e)f 1 = (e, I, r 1 , e) (1-8)(1-8) f1=(e,e,r1,e)f 1 = (e, e, r 1 , e)
用顺序值替换常量,得到:f1(x1,y1,r1,z1)=(见下方列表)Replace the constant with the order value to get: f 1 (x 1 , y 1 , r 1 , z 1 )= (see list below)
序号Serial number 行矩阵f1 Row matrix f 1 序号Serial number 行矩阵f1 Row matrix f 1
(1-1)(1-1) f1=(e,1,2,5)f 1 = (e, 1, 2, 5) (1-5)(1-5) f1=(e,e,2,5)f 1 = (e, e, 2, 5)
(1-2)(1-2) f1=(e,1,2,f2)f 1 = (e,1,2,f 2 ) (1-6)(1-6) f1=(e,e,2,f2)f 1 = (e, e, 2, f 2 )
(1-3)(1-3) f1=(e,1,2,f3)f 1 =(e,1,2,f 3 ) (1-7)(1-7) f1=(e,e,2,f3)f 1 = (e, e, 2, f 3 )
(1-4)(1-4) f1=(e,1,2,e)f 1 =(e,1,2,e) (1-8)(1-8) f1=(e,e,2,e)f 1 = (e, e, 2, e)
{r2}={said}{r 2 }={said}
{x2}={what A,what B,e}{x 2 }={what A,what B,e}
{y2}={I,you,f1,e}{y 2 }={I,you,f 1 ,e}
{z2}={f3,e}{z 2 }={f 3 ,e}
运用组合数学中的乘法原理:f2(x2,y2,r2,z2)=(见下方列表)Apply the multiplication principle in combinatorial mathematics: f 2 (x 2 , y 2 , r 2 , z 2 )= (see list below)
序号Serial number 行矩阵f2 Row matrix f 2 序号Serial number 行矩阵f2 Row matrix f 2
(2-1)(2-1) f2=(what A,I,r2,f3)f 2 =(what A, I, r 2 , f 3 ) (2-13)(2-13) f2=(what B,f1,r2,f3)f 2 =(what B,f 1 ,r 2 ,f 3 )
(2-2)(2-2) f2=(what A,I,r2,e)f 2 =(what A,I,r 2 ,e) (2-14)(2-14) f2=(what B,f1,r2,e)f 2 =(what B,f 1 ,r 2 ,e)
(2-3)(2-3) f2=(what A,you,r2,f3)f 2 =(what A,you,r 2 ,f 3 ) (2-15)(2-15) f2=(what B,e,r2,f3)f 2 =(what B,e,r 2 ,f 3 )
(2-4)(2-4) f2=(what A,you,r2,e)f 2 =(what A,you,r 2 ,e) (2-16)(2-16) f2=(what B,e,r2,e)f 2 =(what B,e,r 2 ,e)
(2-5)(2-5) f2=(what A,f1,r2,f3)f 2 =(what A,f 1 ,r 2 ,f 3 ) (2-17)(2-17) f2=(e,I,r2,f3)f 2 = (e, I, r 2 , f 3 )
(2-6)(2-6) f2=(what A,f1,r2,e)f 2 =(what A,f 1 ,r 2 ,e) (2-18)(2-18) f2=(e,I,r2,e)f 2 = (e, I, r 2 , e)
(2-7)(2-7) f2=(what A,e,r2,f3)f 2 =(what A,e,r 2 ,f 3 ) (2-19)(2-19) f2=(e,you,r2,f3)f 2 =(e,you,r 2 ,f 3 )
(2-8)(2-8) f2=(what A,e,r2,e)f 2 =(what A,e,r 2 ,e) (2-20)(2-20) f2=(e,you,r2,e)f 2 =(e,you,r 2 ,e)
(2-9)(2-9) f2=(what B,I,r2,f3)f 2 =(what B,I,r 2 ,f 3 ) (2-21)(2-21) f2=(e,f1,r2,f3)f 2 = (e, f 1 , r 2 , f 3 )
(2-10)(2-10) f2=(what B,I,r2,e)f 2 =(what B,I,r 2 ,e) (2-22)(2-22) f2=(e,f1,r2,e)f 2 =(e,f 1 ,r 2 ,e)
(2-11)(2-11) f2=(what B,you,r2,f3)f 2 =(what B,you,r 2 ,f 3 ) (2-23)(2-23) f2=(e,e,r2,f3)f 2 = (e, e, r 2 , f 3 )
(2-12)(2-12) f2=(what B,you,r2,e)f 2 =(what B,you,r 2 ,e) (2-24)(2-24) f2=(e,e,r2,e)f 2 = (e, e, r 2 , e)
用顺序值替换常量,得到:f2(x2,y2,r2,z2)=(见下方列表) Replace the constant with the order value to get: f 2 (x 2 , y 2 , r 2 , z 2 )= (see list below)
序号Serial number 行矩阵f2 Row matrix f 2 序号Serial number 行矩阵f2 Row matrix f 2
(2-1)(2-1) f2=(3,1,6,f3)f 2 = (3,1,6,f 3 ) (2-13)(2-13) f2=(4,f1,6,f3)f 2 =(4,f 1 ,6,f 3 )
(2-2)(2-2) f2=(3,1,6,e)f 2 = (3,1,6,e) (2-14)(2-14) f2=(4,f1,6,e)f 2 =(4,f 1 ,6,e)
(2-3)(2-3) f2=(3,5,6,f3)f 2 = (3,5,6,f 3 ) (2-15)(2-15) f2=(4,e,6,f3)f 2 =(4,e,6,f 3 )
(2-4)(2-4) f2=(3,5,6,e)f 2 = (3, 5, 6, e) (2-16)(2-16) f2=(4,e,6,e)f 2 = (4, e, 6, e)
(2-5)(2-5) f2=(3,f1,6,f3)f 2 = (3, f 1 , 6, f 3 ) (2-17)(2-17) f2=(e,1,6,f3)f 2 = (e,1,6,f 3 )
(2-6)(2-6) f2=(3,f1,6,e)f 2 = (3, f 1 , 6, e) (2-18)(2-18) f2=(e,1,6,e)f 2 = (e, 1, 6, e)
(2-7)(2-7) f2=(3,e,6,f3)f 2 = (3, e, 6, f 3 ) (2-19)(2-19) f2=(e,5,6,f3)f 2 = (e, 5, 6, f 3 )
(2-8)(2-8) f2=(3,e,6,e)f 2 = (3, e, 6, e) (2-20)(2-20) f2=(e,5,6,e)f 2 = (e, 5, 6, e)
(2-9)(2-9) f2=(4,1,6,f3)f 2 = (4,1,6,f 3 ) (2-21)(2-21) f2=(e,f1,6,f3)f 2 = (e, f 1 , 6, f 3 )
(2-10)(2-10) f2=(4,1,6,e)f 2 =(4,1,6,e) (2-22)(2-22) f2=(e,f1,6,e)f 2 = (e, f 1 , 6, e)
(2-11)(2-11) f2=(4,5,6,f3)f 2 =(4,5,6,f 3 ) (2-23)(2-23) f2=(e,e,6,f3)f 2 = (e, e, 6, f 3 )
(2-12)(2-12) f2=(4,5,6,e)f 2 = (4, 5, 6, e) (2-24)(2-24) f2=(e,e,6,e)f 2 = (e, e, 6, e)
{r3}={meant}{r 3 }={meant}
{x3}={what A,what B,e}{x 3 }={what A,what B,e}
{y3}={I,you,f1,f2,e}{y 3 }={I,you,f 1 ,f 2 ,e}
{z3}={e}{z 3 }={e}
运用组合数学中的乘法原理:f3(x3,y3,r3,z3)=(见下方列表)Apply the multiplication principle in combinatorial mathematics: f 3 (x 3 , y 3 , r 3 , z 3 )= (see list below)
序号Serial number 行矩阵f3 Row matrix f 3 (3-8)(3-8) f3=(what B,f1,r3,e)f 3 =(what B,f 1 ,r 3 ,e)
(3-1)(3-1) f3=(what A,I,r3,e)f 3 =(what A,I,r 3 ,e) (3-9)(3-9) f3=(what B,f2,r3,e)f 3 =(what B,f 2 ,r 3 ,e)
(3-2)(3-2) f3=(what A,you,r3,e)f 3 =(what A,you,r 3 ,e) (3-10)(3-10) f3=(what B,e,r3,e)f 3 =(what B,e,r 3 ,e)
(3-3)(3-3) f3=(what A,f1,r3,e)f 3 =(what A,f 1 ,r 3 ,e) (3-11)(3-11) f3=(e,I,r3,e)f 3 = (e, I, r 3 , e)
(3-4)(3-4) f3=(what A,f2,r3,e)f 3 =(what A,f 2 ,r 3 ,e) (3-12)(3-12) f3=(e,you,r3,e)f 3 = (e,you,r 3 ,e)
(3-5)(3-5) f3=(what A,e,r3,e)f 3 =(what A,e,r 3 ,e) (3-13)(3-13) f3=(e,f1,r3,e)f 3 = (e, f 1 , r 3 , e)
(3-6)(3-6) f3=(what B,I,r3,e)f 3 =(what B,I,r 3 ,e) (3-14)(3-14) f3=(e,f2,r3,e)f 3 = (e, f 2 , r 3 , e)
(3-7)(3-7) f3=(what B,you,r3,e)f 3 =(what B,you,r 3 ,e) (3-15)(3-15) f3=(e,e,r3,e)f 3 = (e, e, r 3 , e)
用顺序值替换常量,得到:f3(x3,y3,r3,z3)=(见下方列表)Replace the constant with the order value to get: f 3 (x 3 , y 3 , r 3 , z 3 )= (see list below)
序号Serial number 行矩阵f3 Row matrix f 3 (3-8)(3-8) f3=(4,f1,7,e)f 3 = (4, f 1 , 7, e)
(3-1)(3-1) f3=(3,1,7,e)f 3 = (3,1,7,e) (3-9)(3-9) f3=(4,f2,7,e)f 3 = (4, f 2 , 7, e)
(3-2)(3-2) f3=(3,5,7,e)f 3 = (3, 5, 7, e) (3-10)(3-10) f3=(4,e,7,e)f 3 = (4, e, 7, e)
(3-3)(3-3) f3=(3,f1,7,e)f 3 = (3, f 1 , 7, e) (3-11)(3-11) f3=(e,1,7,e)f 3 = (e, 1, 7, e)
(3-4)(3-4) f3=(3,f2,7,e)f 3 = (3, f 2 , 7, e) (3-12)(3-12) f3=(e,5,7,e)f 3 = (e, 5, 7, e)
(3-5)(3-5) f3=(3,e,7,e)f 3 = (3, e, 7, e) (3-13)(3-13) f3=(e,f1,7,e)f 3 = (e, f 1 , 7, e)
(3-6)(3-6) f3=(4,1,7,e)f 3 = (4,1,7,e) (3-14)(3-14) f3=(e,f2,7,e)f 3 = (e, f 2 , 7, e)
(3-7)(3-7) f3=(4,5,7,e)f 3 = (4, 5, 7, e) (3-15)(3-15) f3=(e,e,7,e)f 3 = (e, e, 7, e)
运用组合数学中的乘法原理:Apply the principle of multiplication in combinatorial mathematics:
|S|=|f1|×|f2|×|f3|=8×24×15=2880|S|=|f 1 |×|f 2 |×|f 3 |=8×24×15=2880
则总共生成2880个可能矩阵解。A total of 2880 possible matrix solutions are generated.
步骤140、根据所有句法向量的所有可能取值生成至少一个句法结构可能矩阵解,所述句法结构可能矩阵解由按照谓语动词单元编号顺序排列的句法向量组成。Step 140: Generate at least one syntax structure possible matrix solution according to all possible values of all syntax vectors, where the syntax structure may be composed of syntax vectors arranged in order of predicate verb unit numbers.
对于上述表1所示的经预处理的语句S=“I can understand what what you said meant” 基于f1,f2和f3的可能取值,可以获得多个可能矩阵解。For the pre-processed statement S = "I can understand what what you said meant" shown in Table 1 above, based on the possible values of f 1 , f 2 and f 3 , a plurality of possible matrix solutions can be obtained.
步骤150、验证根据句法结构可能矩阵解得到的语句是否与所述经预处理的语句完全相同,如果完全相同,则将该句法结构可能矩阵解中的各句法向量输出,并作为句法结构解析结果之一。Step 150: Verify whether the statement obtained by the possible matrix solution according to the syntax structure is exactly the same as the preprocessed statement. If they are identical, the syntactic vector may be outputted in the possible matrix solution and used as a syntactic structure analysis result. one.
优选地,利用词单元编号替代词单元进行等量代换、整体插空、偏加操作,然后基于获得的语句序列是否为顺序递增的数字序列判断是否与经预处理的语句完全相同。Preferably, the word unit number is used instead of the word unit for equal-substitution, overall insertion, and partial addition operations, and then it is determined whether the sequence of sentences is in the same order as the pre-processed statement based on whether the obtained sequence of sentences is a sequentially increasing number sequence.
步骤150可以包括如下步骤:Step 150 can include the following steps:
步骤151、如果存在不在该句法结构可能矩阵解中出现的顺序值,则排除该句法结构可能矩阵解;例如对于如下的可能矩阵解:Step 151: If there is a sequence value that does not appear in the possible matrix solution of the syntax structure, the possible matrix solution may be excluded from the syntax structure; for example, for the following possible matrix solution:
Figure PCTCN2015083760-appb-000597
Figure PCTCN2015083760-appb-000597
编号为4的词单元没有出现,排除。The word unit numbered 4 does not appear and is excluded.
步骤152、如果在不同的句法向量中出现相同的顺序值或出现相同的句法向量,则排除该句法结构可能矩阵解;例如对于如下的可能矩阵解:Step 152: If the same order value appears in different syntax vectors or the same syntax vector appears, the possible syntax solution of the syntax structure is excluded; for example, for the following possible matrix solutions:
Figure PCTCN2015083760-appb-000598
Figure PCTCN2015083760-appb-000598
编号为5的词单元出现了两次,排除。The word unit numbered 5 appears twice and is excluded.
步骤153、在每一个可能矩阵解中,将与其他句法向量之间存在相互代入关系的句法向量全都进行等量代换,如果在等量代换之后出现两个句法向量的交叉矛盾,则排除该句法结构可能矩阵解;Step 153: In each possible matrix solution, the syntactic vectors having mutual substitution relations with other syntactic vectors are all equally substituted, and if cross-contradictions of two syntactic vectors appear after equal-substitution, the exclusion is excluded. The syntax structure may be a matrix solution;
例如对于如下的可能矩阵解:For example, the following possible matrix solutions:
Figure PCTCN2015083760-appb-000599
Figure PCTCN2015083760-appb-000599
对上述矩阵进行代入,f2和f3出现了函数的代入交叉矛盾。代入得到:f2=3+<e+<6+<(4+<f2+<7+<e)。等式左右两端同时出现了f2,这就出现了(的)逻辑矛盾。排除。Substituting the above matrix, f 2 and f 3 appear to cross-contradict the function. Substituting: f 2 = 3 + < e + < 6 + < (4 + < f 2 + < 7 + < e). The f 2 appears at both ends of the equation, and there is a logical contradiction. exclude.
步骤154、在每一个可能矩阵解中,将与其他句法向量之间存在相互代入关系的句法向量全都进行等量代换,如果在等量代换之后出现两个位置逆反的顺序值,则排除该句法结构可能矩阵解;Step 154: In each possible matrix solution, the syntactic vectors having mutual substitution relations with other syntax vectors are all replaced by equal amounts. If two position reversal order values appear after the equal amount substitution, the exclusion is performed. The syntax structure may be a matrix solution;
例如对于如下的可能矩阵解:For example, the following possible matrix solutions:
Figure PCTCN2015083760-appb-000600
Figure PCTCN2015083760-appb-000600
对其进行代入,f2=4+<5+<6+<3+<e+<7+<e,得到顺序为(4,5,6,3,e,7,e),出现位置逆反的顺序值,排除。Substituting it, f 2 =4+<5+<6+<3+<e+<7+<e, the order is (4,5,6,3,e,7,e), and the position is reversed. Order value, exclude.
步骤155、在任意一个可能矩阵解中,如果存在与其他句法向量之间没有相互代入关系的句法向量,则执行插空操作以获得所有该可能矩阵解所对应的可能句法解析结构,并验证根据所述可能句法解析结构得到的语句是否与所述经预处理的语句完全相同,其进一步包括:Step 155: In any one of the possible matrix solutions, if there is a syntax vector that does not have a mutual substitution relationship with other syntax vectors, perform an insertion operation to obtain a possible syntax parsing structure corresponding to all the possible matrix solutions, and verify the basis Whether the statement obtained by the possible syntax parsing structure is identical to the pre-processed statement, further comprising:
步骤155.1,先对该可能矩阵解中相互之间存在代入关系的句法向量进行等量代换,从而将该可能矩阵解转化为一组相互之间不存在代入关系的句法向量
Figure PCTCN2015083760-appb-000601
将可能矩阵解中的句法向量称为第一类句法向量,将转化得到的句法向量
Figure PCTCN2015083760-appb-000602
称为第二类句法向量;
In step 155.1, the syntactic vector having the substitution relationship between the possible matrix solutions is firstly substituted, thereby transforming the possible matrix solution into a set of syntactic vectors which do not have an substitution relationship with each other.
Figure PCTCN2015083760-appb-000601
The syntactic vector in the possible matrix solution is called the first kind of syntactic vector, and the transformed syntactic vector will be transformed.
Figure PCTCN2015083760-appb-000602
Called the second type of syntax vector;
步骤155.2,任取一个第二类句法向量
Figure PCTCN2015083760-appb-000603
按照预定的方向逐一标注
Figure PCTCN2015083760-appb-000604
中的每一个句法元素的顺序值;标注句法元素的顺序值之后,任取
Figure PCTCN2015083760-appb-000605
中的第i个句法元素,仅在该句法元素的第一侧构造唯一的空位;造空之后,任取一个句法向量
Figure PCTCN2015083760-appb-000606
以外的第二类句法向量
Figure PCTCN2015083760-appb-000607
以整体插空的方式将句法向量
Figure PCTCN2015083760-appb-000608
插入所构造的空位,进而生成一个新的句法向量,将这个新句法向量记为
Figure PCTCN2015083760-appb-000609
并将整体插空而得到的句法向量,统称为第三类句法向量;
Step 155.2, take a second type of syntax vector
Figure PCTCN2015083760-appb-000603
Mark one by one according to the predetermined direction
Figure PCTCN2015083760-appb-000604
The order value of each syntax element in the message; after appending the order value of the syntax element, take any
Figure PCTCN2015083760-appb-000605
The i-th syntax element in the construct, only a unique gap is constructed on the first side of the syntax element; after the void, take a syntax vector
Figure PCTCN2015083760-appb-000606
Second type of syntax vector
Figure PCTCN2015083760-appb-000607
Syntactic vector in the form of overall insertion
Figure PCTCN2015083760-appb-000608
Insert the constructed vacancy, and then generate a new syntax vector, record this new syntax vector as
Figure PCTCN2015083760-appb-000609
The syntactic vectors obtained by inserting the whole into space are collectively referred to as the third type of syntax vector;
步骤155.3,对第三类句法向量
Figure PCTCN2015083760-appb-000610
按照预定的方向对从向量
Figure PCTCN2015083760-appb-000611
中的 第一侧第一个句法元素开始到向量
Figure PCTCN2015083760-appb-000612
中包含的向量
Figure PCTCN2015083760-appb-000613
的第二侧第一个句法元素为止的每一个句法元素,全都标注顺序值;位于向量
Figure PCTCN2015083760-appb-000614
中包含的向量
Figure PCTCN2015083760-appb-000615
第一侧的元素,不标注顺序值;将向量
Figure PCTCN2015083760-appb-000616
的第二侧的第一个句法元素记为
Figure PCTCN2015083760-appb-000617
将按照前述方式对向量
Figure PCTCN2015083760-appb-000618
标注的句法向量部分,记为甩尾句法向量
Figure PCTCN2015083760-appb-000619
标注顺序值之后,任取一个前述的甩尾向量中的第j个句法元素,仅在该元素第一侧构造唯一的空位;造空之后,任取一个未使用过的第二类句法向量
Figure PCTCN2015083760-appb-000620
以整体插空的方式将句法向量
Figure PCTCN2015083760-appb-000621
插入所构造的空位,进而生成一个新的句法向量,则将新生成的句法向量记为
Figure PCTCN2015083760-appb-000622
或者
Step 155.3, for the third type of syntax vector
Figure PCTCN2015083760-appb-000610
Pair vector from the predetermined direction
Figure PCTCN2015083760-appb-000611
The first syntactic element on the first side starts into the vector
Figure PCTCN2015083760-appb-000612
Vector contained in
Figure PCTCN2015083760-appb-000613
Each of the syntax elements up to the first syntactic element on the second side, all of which are labeled with a sequence value;
Figure PCTCN2015083760-appb-000614
Vector contained in
Figure PCTCN2015083760-appb-000615
The element on the first side, without the order value; the vector
Figure PCTCN2015083760-appb-000616
The first syntax element on the second side is marked as
Figure PCTCN2015083760-appb-000617
Will be vectored as described above
Figure PCTCN2015083760-appb-000618
The syntactic vector part of the annotation, denoted as the iris syntax vector
Figure PCTCN2015083760-appb-000619
After the order value is marked, take the jth syntax element in the aforementioned tail vector, and construct a unique gap only on the first side of the element; after the empty, take an unused second type of syntax vector
Figure PCTCN2015083760-appb-000620
Syntactic vector in the form of overall insertion
Figure PCTCN2015083760-appb-000621
Insert the constructed vacancy, and then generate a new syntax vector, then record the newly generated syntax vector as
Figure PCTCN2015083760-appb-000622
or
第三类句法向量
Figure PCTCN2015083760-appb-000623
按照预定方向,对句法向量
Figure PCTCN2015083760-appb-000624
中的每一个句法元素全都标注顺序值;标注句法元素的顺序值之后,任取一个
Figure PCTCN2015083760-appb-000625
中的第t个句法元素,在该句法元素的第一侧构造唯一的空位;造空之后,任取一个未使用过得第二类句法向量
Figure PCTCN2015083760-appb-000626
以整体插空的方式将该向量
Figure PCTCN2015083760-appb-000627
插入前面构造的空位,进而生成一个新向量,则该新向量记为
Figure PCTCN2015083760-appb-000628
Third type of syntax vector
Figure PCTCN2015083760-appb-000623
Syntactic vector according to the predetermined direction
Figure PCTCN2015083760-appb-000624
Each syntax element in the label is labeled with a sequential value; after the order value of the syntax element is annotated, take one
Figure PCTCN2015083760-appb-000625
The tth syntax element in the construct, constructing a unique gap on the first side of the syntax element; after the void, taking an unused second type of syntax vector
Figure PCTCN2015083760-appb-000626
The vector is inserted as a whole
Figure PCTCN2015083760-appb-000627
Insert the previously constructed gap and generate a new vector, then the new vector is recorded as
Figure PCTCN2015083760-appb-000628
步骤155.4,重复执行步骤155.3,每当上一次造空和插空步骤结束的时候,对经过上一次造空和插空步骤而得到的第三类句法向量进行下一次的造空和插空操作,直至将所有第二类句法向量
Figure PCTCN2015083760-appb-000629
全部插空完毕,最后得到一个单行的第三类句法向量,将所述最后得到的第三类句法向量称为最终单行向量;
In step 155.4, step 155.3 is repeatedly executed. When the last emptying and emptying steps are completed, the third type of syntactic vector obtained through the previous emptying and emptying steps is subjected to the next emptying and emptying operation. Until all the second type of syntax vector
Figure PCTCN2015083760-appb-000629
After all the insertions are completed, a third type of syntax vector of a single line is finally obtained, and the finally obtained third type of syntax vector is called a final single line vector;
步骤155.5,如果一个可能句法解析结构对应的所有所述最终单行向量中均存在两个位置逆反的顺序值,则排除该可能句法解析结构;Step 155.5, if there are two position reversal order values in all the final single row vectors corresponding to a possible syntax parsing structure, the possible syntactic parsing structure is excluded;
步骤155.6,重复执行步骤155.2至步骤155.5直至所有可能句法解析结构被遍历。In step 155.6, steps 155.2 through 155.5 are repeated until all possible syntactic parsing structures are traversed.
例如,对于如上所述的示例,一个句法结构可能矩阵解为:For example, for the example described above, a syntactic structure might be solved as:
Figure PCTCN2015083760-appb-000630
Figure PCTCN2015083760-appb-000630
将上述矩阵转化为线性表达式为:Convert the above matrix into a linear expression:
Figure PCTCN2015083760-appb-000631
Figure PCTCN2015083760-appb-000631
经过前述的插空操作,每一个最终单行向量中都存在两个位置逆反的顺序值,排除。After the aforementioned insertion operation, there are two position reversal order values in each final single line vector, and are excluded.
对于如上所述的示例,一个句法结构可能矩阵解为:For the example described above, a syntactic structure might be solved as:
Figure PCTCN2015083760-appb-000632
Figure PCTCN2015083760-appb-000632
可以将矩阵转换为线性表达式:You can convert a matrix to a linear expression:
Figure PCTCN2015083760-appb-000633
Figure PCTCN2015083760-appb-000633
进行等量代换操作得到语句:Perform an equal amount of substitution operations to get the statement:
α=e+<1+<2+<(3+<(4+<5+<6+<e)+<7+<e)α=e+<1+<2+<(3+<(4+<5+<6+<e)+<7+<e)
去掉空单元e,得到:Remove the empty unit e and get:
α=1+<2+<(3+<(4+<5+<6)+<7)α=1+<2+<(3+<(4+<5+<6)+<7)
它与预处理后的语句相同,该嵌套结构是句法结构解析结果之一。It is the same as the pre-processed statement, which is one of the parsing results of the syntax structure.
将词单元常量代入上述矩阵,则句法结构矩阵解可以表达为:Substituting the word unit constant into the above matrix, the syntax structure matrix solution can be expressed as:
Figure PCTCN2015083760-appb-000634
Figure PCTCN2015083760-appb-000634
与这个矩阵表达式相对应的S的线性表达式如下:The linear expression of S corresponding to this matrix expression is as follows:
Figure PCTCN2015083760-appb-000635
Figure PCTCN2015083760-appb-000635
据此,解析句子“I can understand what what you said meant”其句法结构为:I作为主句的主语,can understand作为主句的谓语,从句“what what you said meant”作为主句的宾语从句,在该从句中,第一个what为从句引导词,“what you said”为从句的主语,meant为宾语从句的谓语,宾语从句本身没有宾语;对于“what you said”从句,它充当了宾语从句里嵌套的主语从句,what为引导词,you为主语,said为谓语。According to this, the syntactic structure of the sentence "I can understand what what you said means" is: I is the subject of the main sentence, can understand as the predicate of the main clause, and the clause "what what you said meant" is the object clause of the main clause. In the clause, the first what is the clause of the clause, "what you said" is the subject of the clause, the mean is the predicate of the object clause, the object clause itself has no object; for the "what you said" clause, it acts as the object clause The subject clauses nested inside, what is the guiding word, you are the main language, and the said is the predicate.
进一步,所述方法还可以包括显示步骤,将句法结构解析结果中的各句法向量以及对应的句法结构关系用树状结构在人机交互界面中进行显示。Further, the method may further include a displaying step of displaying each syntax vector in the syntax structure analysis result and the corresponding syntax structure relationship in a human-computer interaction interface by using a tree structure.
C2部分 例2Part C2 Example 2
例2:作为另一示例,以下说明本实施例的方法对于例如:“That men who were appointed didn’t bother the liberals wash’t remarked upon by the press.”这样的复杂结构的语句的解析过程。Example 2: As another example, the following describes the parsing process of the method of the present embodiment for a complicated structure such as "That men who were appointed didn't bother the liberals wash't remarked upon by the press."
上述语句经过预处理除去杂质并编号后的词序列表为:The above statement is preprocessed to remove impurities and the numbered word sequence is:
原句短语Original sentence 短语类型Phrase type 顺序编号Sequence number
ThatThat 从属关联词单元Subordinate unit 11
menMen 名词代词单元Noun pronoun unit 22
whoWho 从属关联词单元Subordinate unit 33
were appointedWere appointed 谓语动词单元Predicate verb unit 44
didn’t botherDid’t bother 谓语动词单元Predicate verb unit 55
the liberalsThe liberals 名词代词单元Noun pronoun unit 66
wasn’t remarkedWasn’t remarked 谓语动词单元Predicate verb unit 77
该句共有三个谓语动词单元,分别记为r1、r2和r3There are three predicate verb units in the sentence, which are denoted as r 1 , r 2 and r 3 respectively .
对于r1有,{r1}={were appointed}For r 1 there, {r 1 }={were appointed}
{x1}={That,who,e}(e是空字符串){x 1 }={That,who,e} (e is an empty string)
{y1}={men,e}{y 1 }={men,e}
{z1}={f2,f3,e}{z 1 }={f 2 ,f 3 ,e}
运用组合数学中的乘法原理:f1(x1,y1,r1,z1)=(见下方列表)Apply the principle of multiplication in combinatorial mathematics: f 1 (x 1 , y 1 , r 1 , z 1 )= (see list below)
序号Serial number 行矩阵f1 Row matrix f 1 序号Serial number 行矩阵f1 Row matrix f 1
(1-1)(1-1) f1=(That,men,r1,f2)f 1 =(That,men,r 1 ,f 2 ) (1-10)(1-10) f1=(That,e,r1,f3)f 1 =(That,e,r 1 ,f 3 )
(1-2)(1-2) f1=(who,men,r1,f2)f 1 =(who,men,r 1 ,f 2 ) (1-11)(1-11) f1=(who,e,r1,f3)f 1 =(who,e,r 1 ,f 3 )
(1-3)(1-3) f1=(e,men,r1,f2)f 1 =(e,men,r 1 ,f 2 ) (1-12)(1-12) f1=(e,e,r1,f3)f 1 = (e, e, r 1 , f 3 )
(1-4)(1-4) f1=(That,e,r1,f2)f 1 =(That,e,r 1 ,f 2 ) (1-13)(1-13) f1=(That,men,r1,e)f 1 =(That,men,r 1 ,e)
(1-5)(1-5) f1=(who,e,r1,f2)f 1 =(who,e,r 1 ,f 2 ) (1-14)(1-14) f1=(who,men,r1,e)f 1 =(who,men,r 1 ,e)
(1-6)(1-6) f1=(e,e,r1,f2)f 1 = (e, e, r 1 , f 2 ) (1-15)(1-15) f1=(e,men,r1,e)f 1 =(e,men,r 1 ,e)
(1-7)(1-7) f1=(That,men,r1,f3)f 1 =(That,men,r 1 ,f 3 ) (1-16)(1-16) f1=(That,e,r1,e)f 1 =(That,e,r 1 ,e)
(1-8)(1-8) f1=(who,men,r1,f3)f 1 =(who,men,r 1 ,f 3 ) (1-17)(1-17) f1=(who,e,r1,e)f 1 =(who,e,r 1 ,e)
(1-9)(1-9) f1=(e,men,r1,f3)f 1 =(e,men,r 1 ,f 3 ) (1-18)(1-18) f1=(e,e,r1,e)f 1 = (e, e, r 1 , e)
用顺序值替换常量,得到:f1(x1,y1,r1,z1)=(见下方列表)Replace the constant with the order value to get: f 1 (x 1 , y 1 , r 1 , z 1 )= (see list below)
序号Serial number 行矩阵f1 Row matrix f 1 序号Serial number 行矩阵f1 Row matrix f 1
(1-1)(1-1) f1=(1,2,4,f2)f 1 = (1, 2 , 4, f 2 ) (1-10)(1-10) f1=(1,e,4,f3)f 1 = (1, e, 4, f 3 )
(1-2)(1-2) f1=(3,2,4,f2)f 1 = (3, 2 , 4, f 2 ) (1-11)(1-11) f1=(3,e,4,f3)f 1 = (3, e, 4, f 3 )
(1-3)(1-3) f1=(e,2,4,f2)f 1 = (e, 2, 4, f 2 ) (1-12)(1-12) f1=(e,e,4,f3)f 1 = (e, e, 4, f 3 )
(1-4)(1-4) f1=(1,e,4,f2)f 1 = (1, e, 4, f 2 ) (1-13)(1-13) f1=(1,2,4,e)f 1 = (1, 2, 4, e)
(1-5)(1-5) f1=(3,e,4,f2)f 1 = (3, e, 4, f 2 ) (1-14)(1-14) f1=(3,2,4,e)f 1 = (3, 2, 4, e)
(1-6)(1-6) f1=(e,e,4,f2)f 1 = (e, e, 4, f 2 ) (1-15)(1-15) f1=(e,2,4,e)f 1 = (e, 2, 4, e)
(1-7)(1-7) f1=(1,2,4,f3)f 1 = (1, 2, 4, f 3 ) (1-16)(1-16) f1=(1,e,4,e)f 1 = (1, e, 4, e)
(1-8)(1-8) f1=(3,2,4,f3)f 1 = (3, 2, 4 , f 3 ) (1-17)(1-17) f1=(3,e,4,e)f 1 = (3, e, 4, e)
(1-9)(1-9) f1=(e,2,4,f3)f 1 = (e, 2, 4, f 3 ) (1-18)(1-18) f1=(e,e,4,e)f 1 = (e, e, 4, e)
对于r2有,{r2}={didn’t bother}For r 2 , {r 2 }={didn't bother}
{x2}={That,who,e}(e是空字符串){x 2 }={That,who,e} (e is an empty string)
{y2}={men,f1,e}{y 2 }={men,f 1 ,e}
{z2}={the liberals,f3,e}{z 2 }={the liberals,f 3 ,e}
运用组合数学中的乘法原理:f2(x2,y2,r2,z2)=(见下方列表)Apply the multiplication principle in combinatorial mathematics: f 2 (x 2 , y 2 , r 2 , z 2 )= (see list below)
序号Serial number 行矩阵f2 Row matrix f 2 (2-14)(2-14) f2=(who,f1,r2,f3)f 2 =(who,f 1 ,r 2 ,f 3 )
(2-1)(2-1) f2=(That,men,r2,the liberals)f 2 =(That,men,r 2 ,the liberals) (2-15)(2-15) f2=(e,f1,r2,f3)f 2 = (e, f 1 , r 2 , f 3 )
(2-2)(2-2) f2=(who,men,r2,the liberals)f 2 =(who,men,r 2 ,the liberals) (2-16)(2-16) f2=(That,e,r2,f3)f 2 =(That,e,r 2 ,f 3 )
(2-3)(2-3) f2=(e,men,r2,the liberals)f 2 =(e,men,r 2 ,the liberals) (2-17)(2-17) f2=(who,e,r2,f3)f 2 =(who,e,r 2 ,f 3 )
(2-4)(2-4) f2=(That,f1,r2,the liberals)f 2 =(That,f 1 ,r 2 ,the liberals) (2-18)(2-18) f2=(e,e,r2,f3)f 2 = (e, e, r 2 , f 3 )
(2-5)(2-5) f2=(who,f1,r2,the liberals)f 2 =(who,f 1 ,r 2 ,the liberals) (2-19)(2-19) f2=(That,men,r2,e)f 2 =(That,men,r 2 ,e)
(2-6)(2-6) f2=(e,f1,r2,the liberals)f 2 =(e,f 1 ,r 2 ,the liberals) (2-20)(2-20) f2=(who,men,r2,e)f 2 =(who,men,r 2 ,e)
(2-7)(2-7) f2=(That,e,r2,the liberals)f 2 =(That,e,r 2 ,the liberals) (2-21)(2-21) f2=(e,men,r2,e)f 2 =(e,men,r 2 ,e)
(2-8)(2-8) f2=(who,e,r2,the liberals)f 2 =(who,e,r 2 ,the liberals) (2-22)(2-22) f2=(That,f1,r2,e)f 2 =(That,f 1 ,r 2 ,e)
(2-9)(2-9) f2=(e,e,r2,the liberals)f 2 =(e,e,r 2 ,the liberals) (2-23)(2-23) f2=(who,f1,r2,e)f 2 =(who,f 1 ,r 2 ,e)
(2-10)(2-10) f2=(That,men,r2,f3)f 2 =(That,men,r 2 ,f 3 ) (2-24)(2-24) f2=(e,f1,r2,e)f 2 =(e,f 1 ,r 2 ,e)
(2-11)(2-11) f2=(who,men,r2,f3)f 2 =(who,men,r 2 ,f 3 ) (2-25)(2-25) f2=(That,e,r2,e)f 2 =(That,e,r 2 ,e)
(2-12)(2-12) f2=(e,men,r2,f3)f 2 =(e,men,r 2 ,f 3 ) (2-26)(2-26) f2=(who,e,r2,e)f 2 =(who,e,r 2 ,e)
(2-13)(2-13) f2=(That,f1,r2,f3)f 2 =(That,f 1 ,r 2 ,f 3 ) (2-27)(2-27) f2=(e,e,r2,e)f 2 = (e, e, r 2 , e)
用顺序值替换常量,得到:f2(x2,y2,r2,z2)=(见下方列表)Replace the constant with the order value to get: f 2 (x 2 , y 2 , r 2 , z 2 )= (see list below)
序号Serial number 行矩阵f2 Row matrix f 2 (2-14)(2-14) f2=(3,f1,5,f3)f 2 = (3, f 1 , 5, f 3 )
(2-1)(2-1) f2=(1,2,5,6)f 2 = (1, 2, 5, 6) (2-15)(2-15) f2=(e,f1,5,f3)f 2 = (e, f 1 , 5, f 3 )
(2-2)(2-2) f2=(3,2,5,6)f 2 = (3, 2, 5 , 6) (2-16)(2-16) f2=(1,e,5,f3)f 2 = (1, e, 5, f 3 )
(2-3)(2-3) f2=(e,2,5,6)f 2 = (e, 2, 5, 6) (2-17)(2-17) f2=(3,e,5,f3)f 2 = (3, e, 5, f 3 )
(2-4)(2-4) f2=(1,f1,5,6)f 2 = (1, f 1 , 5, 6) (2-18)(2-18) f2=(e,e,5,f3)f 2 = (e, e, 5, f 3 )
(2-5)(2-5) f2=(3,f1,5,6)f 2 = (3, f 1 , 5, 6) (2-19)(2-19) f2=(1,2,5,e)f 2 = (1, 2, 5, e)
(2-6)(2-6) f2=(e,f1,5,6)f 2 = (e, f 1 , 5, 6) (2-20)(2-20) f2=(3,2,5,e)f 2 = (3, 2 , 5, e)
(2-7)(2-7) f2=(1,e,5,6)f 2 = (1, e, 5, 6) (2-21)(2-21) f2=(e,2,5,e)f 2 = (e, 2, 5, e)
(2-8)(2-8) f2=(3,e,5,6)f 2 = (3, e, 5, 6) (2-22)(2-22) f2=(1,f1,5,e)f 2 = (1, f 1 , 5, e)
(2-9)(2-9) f2=(e,e,5,6)f 2 = (e, e, 5, 6) (2-23)(2-23) f2=(3,f1,5,e)f 2 = (3, f 1 , 5, e)
(2-10)(2-10) f2=(1,2,5,f3)f 2 = (1, 2, 5, f 3 ) (2-24)(2-24) f2=(e,f1,5,e)f 2 = (e, f 1 , 5, e)
(2-11)(2-11) f2=(3,2,5,f3)f 2 = (3, 2 , 5, f 3 ) (2-25)(2-25) f2=(1,e,5,e)f 2 = (1, e, 5, e)
(2-12)(2-12) f2=(e,2,5,f3)f 2 = (e, 2, 5, f 3 ) (2-26)(2-26) f2=(3,e,5,e)f 2 = (3, e, 5, e)
(2-13)(2-13) f2=(1,f1,5,f3)f 2 = (1, f 1 , 5, f 3 ) (2-27)(2-27) f2=(e,e,5,e)f 2 = (e, e, 5, e)
对于r3有:{r3}={wasn’t remarked}For r 3 there are: {r 3 }={wasn't remarked}
{x3}={That,who,e}{x 3 }={That,who,e}
{y3}={men,the liberals,f1,f2,e}{y 3 }={men,the liberals,f 1 ,f 2 ,e}
{z3}={e}{z 3 }={e}
运用组合数学中的乘法原理:f3(x3,y3,r3,z3)=(见下方列表)Apply the multiplication principle in combinatorial mathematics: f 3 (x 3 , y 3 , r 3 , z 3 )= (see list below)
序号Serial number 行矩阵f3 Row matrix f 3 (3-8)(3-8) f3=(who,f1,r3,e)f 3 =(who,f 1 ,r 3 ,e)
(3-1)(3-1) f3=(That,men,r3,e)f 3 =(That,men,r 3 ,e) (3-9)(3-9) f3=(e,f1,r3,e)f 3 = (e, f 1 , r 3 , e)
(3-2)(3-2) f3=(who,men,r3,e)f 3 =(who,men,r 3 ,e) (3-10)(3-10) f3=(That,f2,r3,e)f 3 =(That,f 2 ,r 3 ,e)
(3-3)(3-3) f3=(e,men,r3,e)f 3 =(e,men,r 3 ,e) (3-11)(3-11) f3=(who,f2,r3,e)f 3 =(who,f 2 ,r 3 ,e)
(3-4)(3-4) f3=(That,the liberals,r3,e)f 3 =(That, the liberals,r 3 ,e) (3-12)(3-12) f3=(e,f2,r3,e)f 3 = (e, f 2 , r 3 , e)
(3-5)(3-5) f3=(who,the liberals,r3,e)f 3 =(who,the liberals,r 3 ,e) (3-13)(3-13) f3=(That,e,r3,e)f 3 =(That,e,r 3 ,e)
(3-6)(3-6) f3=(e,the liberals,r3,e)f 3 = (e, the liberals, r 3 , e) (3-14)(3-14) f3=(who,e,r3,e)f 3 =(who,e,r 3 ,e)
(3-7)(3-7) f3=(That,f1,r3,e)f 3 =(That,f 1 ,r 3 ,e) (3-15)(3-15) f3=(e,e,r3,e)f 3 = (e, e, r 3 , e)
用顺序值替换常量,f3(x3,y3,r3,z3)=(见下方列表)Replace the constant with a sequence value, f 3 (x 3 , y 3 , r 3 , z 3 )= (see list below)
序号Serial number 行矩阵f3 Row matrix f 3 (3-8)(3-8) f3=(3,f1,7,e)f 3 = (3, f 1 , 7, e)
(3-1)(3-1) f3=(1,2,7,e)f 3 = (1, 2, 7, e) (3-9)(3-9) f3=(e,f1,7,e)f 3 = (e, f 1 , 7, e)
(3-2)(3-2) f3=(3,2,7,e)f 3 = ( 3 , 2, 7, e) (3-10)(3-10) f3=(1,f2,7,e)f 3 = (1, f 2 , 7, e)
(3-3)(3-3) f3=(e,2,7,e)f 3 = (e, 2, 7, e) (3-11)(3-11) f3=(3,f2,7,e)f 3 = (3, f 2 , 7, e)
(3-4)(3-4) f3=(1,6,7,e)f 3 = (1,6,7,e) (3-12)(3-12) f3=(e,f2,7,e)f 3 = (e, f 2 , 7, e)
(3-5)(3-5) f3=(3,6,7,e)f 3 = (3,6,7,e) (3-13)(3-13) f3=(1,e,7,e)f 3 = (1, e, 7, e)
(3-6)(3-6) f3=(e,6,7,e)f 3 = (e, 6, 7, e) (3-14)(3-14) f3=(3,e,7,e)f 3 = (3, e, 7, e)
(3-7)(3-7) f3=(1,f1,7,e)f 3 = (1, f 1 , 7, e) (3-15)(3-15) f3=(e,e,7,e)f 3 = (e, e, 7, e)
运用组合数学中的乘法原理:Apply the principle of multiplication in combinatorial mathematics:
|S|=|f1|×|f2|×|f3|=18×27×15=7290|S|=|f 1 |×|f 2 |×|f 3 |=18×27×15=7290
则总共生成7290个可能矩阵解。A total of 7290 possible matrix solutions are generated.
对所有的句法结构可能矩阵解,运行矩阵代入求解程序、结构修正程序,可得到作为句法结构解析最终结果的可能矩阵解:For all syntactic structures, the matrix solution may be solved, and the running matrix is substituted into the solver and the structural correction program to obtain the possible matrix solution as the final result of the syntactic structure analysis:
Figure PCTCN2015083760-appb-000636
Figure PCTCN2015083760-appb-000636
该例句是本文的整体插空方法成功处理的一个典型例句。经过本文前述的整体插空处理,上述的可能矩阵解的整体插空结果之一是如下的一个最终单行向量:e+<(1+<2+<(3+<e+<4+<e)+<5+<6)+<7+<e。这个最终单行向量没有出现逆序数,是合理的最终单行向量。这个最终单行向量,与原句完全相同。该可能矩阵解也正是此例句的正确的句法结构解析结果。This example sentence is a typical example sentence successfully processed by the overall interpolation method in this paper. After the overall insertion processing described above, one of the overall null insertion results of the above possible matrix solution is a final single-row vector as follows: e+<(1+<2+<(3+<e+<4+<e)+ <5+<6)+<7+<e. This final single-line vector does not have a reverse order number and is a reasonable final single-line vector. This final single-line vector is identical to the original sentence. This possible matrix solution is also the correct syntactic structure analysis result of this example sentence.
将上述可能矩阵解的编号还原为词单元,得到如下形式:Reverting the number of the above possible matrix solution to a word unit yields the following form:
Figure PCTCN2015083760-appb-000637
Figure PCTCN2015083760-appb-000637
将这个矩阵转化为线性表达式:Convert this matrix to a linear expression:
Figure PCTCN2015083760-appb-000638
Figure PCTCN2015083760-appb-000638
去掉e得:Remove e:
Figure PCTCN2015083760-appb-000639
Figure PCTCN2015083760-appb-000639
由此,得到对于上述语句示例的正确解析,即:f3是主句,也就是核心句;f2是f3的主语,即主语从句;f1是定语从句,修饰men。Thus, the correct interpretation of the above sentence example is obtained, that is, f 3 is the main sentence, that is, the core sentence; f 2 is the subject of f 3 , that is, the subject clause; f 1 is the attributive clause, and the men is modified.
本示例可以较好地显示本方法的优越性。针对上述语句,当前计算机行业公认的两种世界上非常先进的自然语言句法结构解析装置This example can better show the superiority of the method. In response to the above statement, two of the world's most advanced natural language syntax structure parsing devices recognized by the current computer industry
-伯克利解析器(Berkeley Parser)和斯坦福解析器(Stanford Parser),至本申请递交之时,给出的依然是错误的解析结果。这两种装置给出的结果是完全相同的。其结果如下:- Berkeley Parser and Stanford Parser, at the time of submission, still give the wrong result. The results given by these two devices are identical. The results are as follows:
①That men didn’t bother;1That men didn’t bother;
②who were appointed;2who were appointed;
③the liberals wasn’t remarked upon by the press.3the liberals wasn’t remarked upon by the press.
①是主句,也就是核心句;③是①的宾语,即,宾语从句;②是定语从句,修饰men;That是限定词,修饰men。1 is the main sentence, which is the core sentence; 3 is the object of 1 , that is, the object clause; 2 is the attributive clause, modifying the men; That is the qualifier, modifying the men.
在英语当中,如果主语从句位于全句句首,而且由that引导,则that不可以省略,即便口语也是如此。在本发明的方法中,由于将句子处理为句法向量,因此就为主语从句That men  didn’t bother the liberals这一部分,在解析的过程中预留了充分的空间,充分地保护了其作为一个完整分句而生成的可能性。In English, if the subject clause is at the beginning of the sentence and it is guided by that, then that can't be omitted, even if it is spoken. In the method of the present invention, since the sentence is processed into a syntax vector, the subject clause is included. The part that did’t bother the liberals, in the process of parsing, reserved sufficient space to fully protect its possibility of being generated as a complete clause.
对于that引导的主语从句的解析经常出错这一重大技术漏洞,至本申请递交之时,上述两种世界领先的自然语言句法结构解析装置仍然没能弥补。The major technical loopholes in the analysis of the subject clauses that led to the error often failed to make up for the two world-leading natural language syntax structure parsing devices.
C3部分 例3Part C3 Example 3
例3:作为另一示例,以下说明本实施例的方法对于例如:“Jack who has a beautiful car is a businessman.”这样的复杂结构的语句的解析过程。上述语句经过预处理除去杂质并编号后的词序列表为:Example 3: As another example, the parsing process of the method of the present embodiment for a complicated structure such as "Jack who has a beautiful car is a businessman." is explained below. The above statement is preprocessed to remove impurities and the numbered word sequence is:
原句短语Original sentence 短语类型Phrase type 顺序编号Sequence number
JackJack 名词代词单元Noun pronoun unit 11
whoWho 从属关联词单元Subordinate unit 22
hasHas 谓语动词单元Predicate verb unit 33
a cara car 名词代词单元Noun pronoun unit 44
isIs 谓语动词单元Predicate verb unit 55
a businessmana businessman 名词代词单元Noun pronoun unit 66
该句共有两个谓语动词单元,分别记为r1和r2There are two predicate verb units in the sentence, which are denoted as r 1 and r 2 respectively .
对于r1有,{r1}={has}For r 1 there, {r 1 }={has}
{x1}={who,e}(e是空字符串){x 1 }={who,e} (e is an empty string)
{y1}={Jack,e}{y 1 }={Jack,e}
{z1}={a car,f2,e}{z 1 }={a car,f 2 ,e}
运用组合数学中的乘法原理:f1(x1,y1,r1,z1)=(见下方列表)Apply the principle of multiplication in combinatorial mathematics: f 1 (x 1 , y 1 , r 1 , z 1 )= (see list below)
序号Serial number 行矩阵f1 Row matrix f 1 序号Serial number 行矩阵f1 Row matrix f 1
(1-1)(1-1) f1=(who,Jack,r1,a car)f 1 =(who,Jack,r 1 ,a car) (1-7)(1-7) f1=(who,e,r1,f2)f 1 =(who,e,r 1 ,f 2 )
(1-2)(1-2) f1=(e,Jack,r1,a car)f 1 = (e, Jack, r 1 , a car) (1-8)(1-8) f1=(e,e,r1,f2)f 1 = (e, e, r 1 , f 2 )
(1-3)(1-3) f1=(who,e,r1,a car)f 1 =(who,e,r 1 ,a car) (1-9)(1-9) f1=(who,Jack,r1,e)f 1 =(who,Jack,r 1 ,e)
(1-4)(1-4) f1=(e,e,r1,a car)f 1 =(e,e,r 1 ,a car) (1-10)(1-10) f1=(e,Jack,r1,e)f 1 = (e, Jack, r 1 , e)
(1-5)(1-5) f1=(who,Jack,r1,f2)f 1 =(who,Jack,r 1 ,f 2 ) (1-11)(1-11) f1=(who,e,r1,e)f 1 =(who,e,r 1 ,e)
(1-6)(1-6) f1=(e,Jack,r1,f2)f 1 = (e, Jack, r 1 , f 2 ) (1-12)(1-12) f1=(e,e,r1,e)f 1 = (e, e, r 1 , e)
用顺序值替换常量,f1(x1,y1,r1,z1)=(见下方列表)Replace the constant with a sequence value, f 1 (x 1 , y 1 , r 1 , z 1 )= (see list below)
序号Serial number 行矩阵f1 Row matrix f 1 序号Serial number 行矩阵f1 Row matrix f 1
(1-1)(1-1) f1=(2,1,3,4)f 1 =(2,1,3,4) (1-7)(1-7) f1=(2,e,3,f2)f 1 =(2,e,3,f 2 )
(1-2)(1-2) f1=(e,1,3,4)f 1 = (e, 1, 3, 4) (1-8)(1-8) f1=(e,e,3,f2)f 1 = (e, e, 3, f 2 )
(1-3)(1-3) f1=(2,e,3,4)f 1 = (2, e, 3, 4) (1-9)(1-9) f1=(2,1,3,e)f 1 = (2,1,3,e)
(1-4)(1-4) f1=(e,e,3,4)f 1 = (e, e, 3, 4) (1-10)(1-10) f1=(e,1,3,e)f 1 = (e, 1, 3, e)
(1-5)(1-5) f1=(2,1,3,f2)f 1 = (2,1,3,f 2 ) (1-11)(1-11) f1=(2,e,3,e)f 1 = (2, e, 3, e)
(1-6)(1-6) f1=(e,1,3,f2)f 1 =(e,1,3,f 2 ) (1-12)(1-12) f1=(e,e,3,e)f 1 = (e, e, 3, e)
对于r2有,{r2}={is}For r 2 , {r 2 }={is}
{x2}={who,e}(e是空字符串){x 2 }={who,e} (e is an empty string)
{y2}={Jack,a car,f1,e}{y 2 }={Jack,a car,f 1 ,e}
{z2}={a businessman,e}{z 2 }={a businessman,e}
运用组合数学中的乘法原理:f2(x2,y2,r2,z2)=(见下方列表)Apply the multiplication principle in combinatorial mathematics: f 2 (x 2 , y 2 , r 2 , z 2 )= (see list below)
序号Serial number 行矩阵f2 Row matrix f 2 序号Serial number 行矩阵f2 Row matrix f 2
(2-1)(2-1) f2=(who,Jack,r2,a businessman)f 2 =(who,Jack,r 2 ,a businessman) (2-9)(2-9) f2=(who,Jack,r2,e)f 2 =(who,Jack,r 2 ,e)
(2-2)(2-2) f2=(e,Jack,r2,a businessman)f 2 = (e, Jack, r 2 , a businessman) (2-10)(2-10) f2=(e,Jack,r2,e)f 2 = (e, Jack, r 2 , e)
(2-3)(2-3) f2=(who,a car,r2,a businessman)f 2 =(who,a car,r 2 ,a businessman) (2-11)(2-11) f2=(who,a car,r2,e)f 2 =(who,a car,r 2 ,e)
(2-4)(2-4) f2=(e,a car,r2,a businessman)f 2 =(e,a car,r 2 ,a businessman) (2-12)(2-12) f2=(e,a car,r2,e)f 2 =(e,a car,r 2 ,e)
(2-5)(2-5) f2=(who,f1,r2,a businessman)f 2 =(who,f 1 ,r 2 ,a businessman) (2-13)(2-13) f2=(who,f1,r2,e)f 2 =(who,f 1 ,r 2 ,e)
(2-6)(2-6) f2=(e,f1,r2,a businessman)f 2 =(e,f 1 ,r 2 ,a businessman) (2-14)(2-14) f2=(e,f1,r2,e)f 2 =(e,f 1 ,r 2 ,e)
(2-7)(2-7) f2=(who,e,r2,a businessman)f 2 =(who,e,r 2 ,a businessman) (2-15)(2-15) f2=(who,e,r2,e)f 2 =(who,e,r 2 ,e)
(2-8)(2-8) f2=(e,e,r2,a businessman)f 2 =(e,e,r 2 ,a businessman) (2-16)(2-16) f2=(e,e,r2,e)f 2 = (e, e, r 2 , e)
用顺序值替换常量,f2(x2,y2,r2,z2)=(见下方列表)Replace the constant with a sequence value, f 2 (x 2 , y 2 , r 2 , z 2 )= (see list below)
序号Serial number 行矩阵f2 Row matrix f 2 序号Serial number 行矩阵f2 Row matrix f 2
(2-1)(2-1) f2=(2,1,5,6)f 2 = (2,1,5,6) (2-9)(2-9) f2=(2,1,5,e)f 2 = (2,1,5,e)
(2-2)(2-2) f2=(e,1,5,6)f 2 = (e, 1, 5, 6) (2-10)(2-10) f2=(e,1,5,e)f 2 = (e, 1, 5, e)
(2-3)(2-3) f2=(2,4,5,6)f 2 = (2, 4, 5, 6) (2-11)(2-11) f2=(2,4,5,e)f 2 = (2, 4, 5, e)
(2-4)(2-4) f2=(e,4,5,6)f 2 = (e, 4, 5, 6) (2-12)(2-12) f2=(e,4,5,e)f 2 = (e, 4, 5, e)
(2-5)(2-5) f2=(2,f1,5,6)f 2 = (2, f 1 , 5, 6) (2-13)(2-13) f1=(2,f1,5,e)f 1 =(2,f 1 ,5,e)
(2-6)(2-6) f2=(e,f1,5,6)f 2 = (e, f 1 , 5, 6) (2-14)(2-14) f2=(e,f1,5,e)f 2 = (e, f 1 , 5, e)
(2-7)(2-7) f2=(2,e,5,6)f 2 = (2, e, 5, 6) (2-15)(2-15) f2=(2,e,5,e)f 2 = (2, e, 5, e)
(2-8)(2-8) f2=(e,e,5,6)f 2 = (e, e, 5, 6) (2-16)(2-16) f2=(e,e,5,e)f 2 = (e, e, 5, e)
运用组合数学中的乘法原理:Apply the principle of multiplication in combinatorial mathematics:
|S|=|f1|×|f2|=12×16=192|S|=|f 1 |×|f 2 |=12×16=192
则总共生成192个可能矩阵解。A total of 192 possible matrix solutions are generated.
可得到作为句法结构解析最终结果的可能矩阵解: A possible matrix solution can be obtained as the final result of the parsing of the syntax structure:
Figure PCTCN2015083760-appb-000640
Figure PCTCN2015083760-appb-000640
该例句是本文的整体插空方法成功处理的一个典型例句。经过本文前述的整体插空处理,上述的可能矩阵解得到了唯一的一个没有出现逆序数的最终单行向量:e+<1+<(2+<e+<3+<4)+<5+<6。这个最终单行向量是合理的最终单行向量。这个最终单行向量的句法顺序值编号,与原句完全相同。该可能矩阵解也正是此例句的正确的句法结构解析结果。This example sentence is a typical example sentence successfully processed by the overall interpolation method in this paper. After the overall interpolation process described above, the above possible matrix solution yields the only final single-row vector without the inverse number: e+<1+<(2+<e+<3+<4)+<5+<6 . This final single-line vector is a reasonable final single-line vector. The syntactic sequence value of this final single-line vector is exactly the same as the original sentence. This possible matrix solution is also the correct syntactic structure analysis result of this example sentence.
将上述可能矩阵解的编号还原为词单元,得到如下形式:Reverting the number of the above possible matrix solution to a word unit yields the following form:
Figure PCTCN2015083760-appb-000641
Figure PCTCN2015083760-appb-000641
将这个矩阵转化为线性表达式:Convert this matrix to a linear expression:
Figure PCTCN2015083760-appb-000642
Figure PCTCN2015083760-appb-000642
去掉e得:Remove e:
Figure PCTCN2015083760-appb-000643
Figure PCTCN2015083760-appb-000643
C4部分 例4Part C4 Example 4
例4:作为另一示例,以下说明本实施例的方法对于例如:“After Jack,Mary and Linda left,I gave my son a new book.”这样的并列结构的语句的解析过程。Example 4: As another example, the parsing process of the method of the present embodiment for a sentence of a parallel structure such as "After Jack, Mary and Linda left, I gave my son a new book." will be described below.
上述语句经过预处理除去杂质并编号后的词序列表为:The above statement is preprocessed to remove impurities and the numbered word sequence is:
原句短语Original sentence 短语类型Phrase type 顺序编号Sequence number
AfterAfter 从属关联词单元Subordinate unit 11
JackJack 名词代词单元Noun pronoun unit 22
MaryMary 名词代词单元Noun pronoun unit 33
andAnd 并列关联词单元Parallel word unit 44
LindaLinda 名词代词单元Noun pronoun unit 55
leftLeft 谓语动词单元Predicate verb unit 66
II 名词代词单元Noun pronoun unit 77
gaveGave 谓语动词单元Predicate verb unit 88
my sonMy son 名词代词单元Noun pronoun unit 99
a booka book 名词代词单元Noun pronoun unit 1010
通过以下步骤包括生成并列名词代词组合向量族:The following steps include generating a parallel lexical pronoun combination vector family:
S2.1选取不重复的两个名词代词单元:S2.1 selects two noun pronoun units that are not repeated:
A、如果介于这两个名词代词单元之间没有其他词单元,则将这两个名词代词单元作为一个并列名词代词组合向量,并保留该并列名词代词组合向量;A. If there are no other word units between the two noun pronoun units, the two noun pronoun units are used as a parallel noun pronoun combination vector, and the parallel noun pronoun combination vector is retained;
B、如果介于这两个名词代词单元之间存在其他词单元,则检查介于这两个名词代词单元之间的每一个词单元:如果介于这两个名词代词单元之间的任意一个词单元,全都是名词代词单元或并列关联词单元,则将所选取的两个名词代词单元和介于这两个名词代词单元之间的全体词单元作为一个并列名词代词组合向量,并保留该并列名词代词组合向量;否则,不生成并列名词代词组合向量;B. If there are other word units between the two noun pronoun units, check each word unit between the two noun pronoun units: if any between the two noun pronoun units Word units, all of which are noun pronoun units or side-by-side related word units, then use the selected two noun pronoun units and the whole word unit between the two noun pronoun units as a parallel noun pronoun combination vector, and retain the juxtaposition Noun pronoun combination vector; otherwise, no parallel noun pronoun combination vector is generated;
S2.2复执行S2.1直至所有的名词代词单元的组合方式被遍历,生成获得的所有的并列名词代词组合向量;S2.2 complex execution S2.1 until all combinations of noun pronoun units are traversed, and all obtained parallel noun pronoun combination vectors are generated;
S2.3如果该可能句法解析结构存在并列名词代词组合向量,则对所有的并列名词代词组合向量进行划分,从而形成若干个并列名词代词组合向量族,使得:在每一个并列名词代词组合向量族中,该并列名词代词组合向量族中所包含的每一个并列名词代词组合向量全都包含了两个共同的名词代词单元。S2.3 If there is a parallel noun pronoun combination vector in the possible syntactic parsing structure, all the parallel noun pronoun combination vectors are divided to form a plurality of parallel noun pronoun combination vector families, so that: in each parallel noun pronoun combination vector family Each collocated noun pronoun combination vector included in the parallel noun pronoun combination vector family all contains two common noun pronoun units.
S2.4在每一个名词代词组合向量族中,选取所有名词代词组合向量中所包含的编号最大的词单元,作为该名词代词组合向量族的最大词单元,以备后续生成主语时使用;选取所有名词代词组合向量中所包含的编号最小的词单元,作为该名词代词组合向量族的最小词单元,以备后续生成宾语时使用。S2.4 selects the largest number of word units contained in all noun pronoun combination vectors in each noun pronoun combination vector family, as the largest word unit of the noun pronoun combination vector family, for use in subsequent generation of the subject; The word unit with the lowest number included in all noun pronoun combination vectors is used as the smallest unit of the noun pronoun combination vector family, and is used for subsequent generation of the object.
在生成主语元素集合{y1}、{y2}的过程中,运行并列主语生成算法如下:In the process of generating the set of subject elements {y 1 }, {y 2 }, the algorithm for running the parallel subject generation is as follows:
①A(S)取出原句中的全体NPI词组、全体VNP词组、全体NOMP词组,并将原句中的全体NPI词组、全体VNP词组、全体NOMP词组列为一个集合,将该集合记为 Ψ={Jack,Mary,Linda,I,my son,a book}={2,3,5,7,9,10}。1A(S) takes out all NPI phrases, all VNP phrases, and all NOMP phrases in the original sentence, and lists all NPI phrases, all VNP phrases, and all NOMP phrases in the original sentence as a set, and records the set as Ψ={Jack, Mary, Linda, I, my son, a book}={2,3,5,7,9,10}.
②B(Ψ)表示按照
Figure PCTCN2015083760-appb-000644
的方式取集合Ψ={2,3,5,7,9,10}中任意两元素的全部组合,设集合
Figure PCTCN2015083760-appb-000645
Figure PCTCN2015083760-appb-000646
则B(Ψ)={{2,3},{2,5},{2,7},{2,9},{2,10},{3,5},{3,7},{3,9},{3,10},{5,7},{5,9},{5,10},{7,9},{10,7},{10,9}}。
2B (Ψ) means follow
Figure PCTCN2015083760-appb-000644
The way to take all combinations of any two elements in the set Ψ={2,3,5,7,9,10}, set the set
Figure PCTCN2015083760-appb-000645
Figure PCTCN2015083760-appb-000646
Then B(Ψ)={{2,3},{2,5},{2,7},{2,9},{2,10},{3,5},{3,7},{ 3,9},{3,10},{5,7},{5,9},{5,10},{7,9},{10,7},{10,9}}.
③K(α,β)对一元函数B(Ψ)的结果,即对任给的一个
Figure PCTCN2015083760-appb-000647
按照元素
Figure PCTCN2015083760-appb-000648
在原句S中的句法顺序值的从小到大排列。则不妨设
Figure PCTCN2015083760-appb-000649
则可得到有序对
Figure PCTCN2015083760-appb-000650
Figure PCTCN2015083760-appb-000651
生成的有序对是:
The result of 3K(α,β) versus unary function B(Ψ), that is, one given
Figure PCTCN2015083760-appb-000647
By element
Figure PCTCN2015083760-appb-000648
The syntactic order values in the original sentence S are arranged from small to large. You may wish to set
Figure PCTCN2015083760-appb-000649
Orderly pair
Figure PCTCN2015083760-appb-000650
then
Figure PCTCN2015083760-appb-000651
The generated ordered pairs are:
{<2,3>,<2,5>,<2,7>,<2,9>,<2,10>,<3,5>,<3,7>,<3,9>,<3,10>,<5,7>,<5,9>,<5,10>,<7,9>,<7,10>,<9,10>}。{<2,3>,<2,5>,<2,7>,<2,9>,<2,10>,<3,5>,<3,7>,<3,9>,< 3, 10>, <5, 7>, <5, 9>, <5, 10>, <7, 9>, <7, 10>, <9, 10>}.
设集合
Figure PCTCN2015083760-appb-000652
进而建立一个连续词串公式
Figure PCTCN2015083760-appb-000653
其中
Figure PCTCN2015083760-appb-000654
是原句S中的从
Figure PCTCN2015083760-appb-000655
Figure PCTCN2015083760-appb-000656
的一组相邻的连续词串或空词串,且
Figure PCTCN2015083760-appb-000657
Figure PCTCN2015083760-appb-000658
Figure PCTCN2015083760-appb-000659
则Φ1=2+<e+<3,Φ2=2+<3+<4+<5,Φ3=2+<3+<4+<5+<6+<7,Φ4=2+<3+<4+<5+<6+<7+<8+<9,Φ5=2+<3+<4+<5+<6+<7+<8+<9+<10,Φ6=3+<4+<5,Φ7=3+<4+<5+<6+<7,Φ8=3+<4+<5+<6+<7+<8+<9,Φ9=3+<4+<5+<6+<7+<8+<9+<10,Φ10=5+<6+<7,Φ11=5+<6+<7+<8+<9,Φ12=5+<6+<7+<8+<9+<10,Φ13=7+<8+<9,Φ14=7+<8+<9+<10,Φ15=9+<e+<10。
Set collection
Figure PCTCN2015083760-appb-000652
Then establish a continuous string formula
Figure PCTCN2015083760-appb-000653
among them
Figure PCTCN2015083760-appb-000654
Is the slave in the original sentence S
Figure PCTCN2015083760-appb-000655
To
Figure PCTCN2015083760-appb-000656
a set of adjacent consecutive or empty words, and
Figure PCTCN2015083760-appb-000657
then
Figure PCTCN2015083760-appb-000658
Figure PCTCN2015083760-appb-000659
Then Φ 1 = 2+ < e + < 3, Φ 2 = 2+ < 3 + < 4 + < 5, Φ 3 = 2+ < 3 + < 4 + < 5 + < 6 + < 7, Φ 4 = 2+ <3+<4+<5+<6+<7+<8+<9, Φ 5 =2+<3+<4+<5+<6+<7+<8+<9+<10, Φ 6 =3+<4+<5, Φ 7 =3+<4+<5+<6+<7, Φ 8 =3+<4+<5+<6+<7+<8+<9 , Φ 9 =3+<4+<5+<6+<7+<8+<9+<10, Φ 10 =5+<6+<7, Φ 11 =5+<6+<7+<8+<9,Φ 12 =5+<6+<7+<8+<9+<10, Φ 13 =7+<8+<9, Φ 14 =7+<8+<9+<10, Φ 15 =9+<e+<10.
④H(Φt)对二元函数K(α,β)生成的
Figure PCTCN2015083760-appb-000660
进行检查:如果对任给 的元素γ∈Φt,且
Figure PCTCN2015083760-appb-000661
Figure PCTCN2015083760-appb-000662
都有:γ=NPI或γ=VNP或γ=NOMP或γ=CONJ或γ=e,则将Φt的标记改为
Figure PCTCN2015083760-appb-000663
称为Φt生成
Figure PCTCN2015083760-appb-000664
设集合
Figure PCTCN2015083760-appb-000665
则集合
Figure PCTCN2015083760-appb-000666
Figure PCTCN2015083760-appb-000667
4H(Φ t ) generated for the binary function K(α, β)
Figure PCTCN2015083760-appb-000660
Check: if the given element is γ∈Φ t , and
Figure PCTCN2015083760-appb-000661
And
Figure PCTCN2015083760-appb-000662
Both: γ=NPI or γ=VNP or γ=NOMP or γ=CONJ or γ=e, then change the mark of Φ t to
Figure PCTCN2015083760-appb-000663
Φ t generation
Figure PCTCN2015083760-appb-000664
Set collection
Figure PCTCN2015083760-appb-000665
Collection
Figure PCTCN2015083760-appb-000666
then
Figure PCTCN2015083760-appb-000667
⑤M(α,β)表示对于任取的一个集合
Figure PCTCN2015083760-appb-000668
如果集合
Figure PCTCN2015083760-appb-000669
存在对应的
Figure PCTCN2015083760-appb-000670
则定义一个集合族,该集合族由包含集合
Figure PCTCN2015083760-appb-000671
的全体集合构成,该集合族记为
Figure PCTCN2015083760-appb-000672
Figure PCTCN2015083760-appb-000673
Figure PCTCN2015083760-appb-000674
则M(α,β)={I1({2,3}),I2({3,5}),I3({9,10})}。
5M(α,β) represents a set for any
Figure PCTCN2015083760-appb-000668
If collection
Figure PCTCN2015083760-appb-000669
Corresponding
Figure PCTCN2015083760-appb-000670
Defining a collection family that contains collections
Figure PCTCN2015083760-appb-000671
The whole set of the composition, the collection family is recorded as
Figure PCTCN2015083760-appb-000672
then
Figure PCTCN2015083760-appb-000673
Figure PCTCN2015083760-appb-000674
Then M(α,β)={I 1 ({2,3}), I 2 ({3,5}), I 3 ({9,10})}.
⑥N(α,β)对二元函数M(α,β)的结果
Figure PCTCN2015083760-appb-000675
即对于任取集合
Figure PCTCN2015083760-appb-000676
Figure PCTCN2015083760-appb-000677
如果集合
Figure PCTCN2015083760-appb-000678
存在对应的集合族
Figure PCTCN2015083760-appb-000679
则构造一个新的集合如下
Figure PCTCN2015083760-appb-000680
则可得P[I1({2,3})]={2,3,4,5},P[I2({3,5})]={2,3,4,5},P[I3({9,10})]={9,10}。
6N(α,β) results for the binary function M(α,β)
Figure PCTCN2015083760-appb-000675
That is, for any collection
Figure PCTCN2015083760-appb-000676
Figure PCTCN2015083760-appb-000677
If collection
Figure PCTCN2015083760-appb-000678
There is a corresponding collection family
Figure PCTCN2015083760-appb-000679
Then construct a new collection as follows
Figure PCTCN2015083760-appb-000680
Then P[I 1 ({2,3})]={2,3,4,5}, P[I 2 ({3,5})]={2,3,4,5}, P [I 3 ({9,10})]={9,10}.
⑦U(α)对二元函数N(α,β)的结果
Figure PCTCN2015083760-appb-000681
Figure PCTCN2015083760-appb-000682
Figure PCTCN2015083760-appb-000683
对任给的元素γ,
Figure PCTCN2015083760-appb-000684
都有τ(γ)≤τ(δ)。则Pmax[I1({2,3})]=5,Pmax[I2({3,5})]=5,Pmax[I3({9,10})]=10。
7U(α) results for the binary function N(α,β)
Figure PCTCN2015083760-appb-000681
take
Figure PCTCN2015083760-appb-000682
Assume
Figure PCTCN2015083760-appb-000683
For the given element γ,
Figure PCTCN2015083760-appb-000684
There are τ(γ) ≤ τ(δ). Then P max [I 1 ({2, 3})] = 5, P max [I 2 ({3, 5})] = 5, P max [I 3 ({9, 10})] = 10.
对于r1有{r1}={left},编号为6。则对应的主语的选取方法为:当不存在rk-1时:{yk}=NPIyk∪VNPyk∪NOMPk∪Gk∪{e};For r 1 there is {r 1 }={left}, numbered 6. Then the corresponding subject is selected as follows: when there is no r k-1 : {y k }=NPI yk ∪VNP yk ∪NOMP k ∪G k ∪{e};
其中:
Figure PCTCN2015083760-appb-000685
Figure PCTCN2015083760-appb-000686
among them:
Figure PCTCN2015083760-appb-000685
Figure PCTCN2015083760-appb-000686
其中:
Figure PCTCN2015083760-appb-000687
Figure PCTCN2015083760-appb-000688
among them:
Figure PCTCN2015083760-appb-000687
Figure PCTCN2015083760-appb-000688
在上述的公式中,Gk表示最大值编号小于对应的谓语动词单元编号的全体并列名词代词集合 族的并集合。In the above formula, G k denotes a union of the total number of collocated noun pronouns whose maximum value is smaller than the corresponding predicate verb unit number.
则有:
Figure PCTCN2015083760-appb-000689
Then there are:
Figure PCTCN2015083760-appb-000689
则有:
Figure PCTCN2015083760-appb-000690
Figure PCTCN2015083760-appb-000691
Figure PCTCN2015083760-appb-000692
Figure PCTCN2015083760-appb-000693
则r1对应的主语元素的集合是:
Figure PCTCN2015083760-appb-000694
Then there are:
Figure PCTCN2015083760-appb-000690
Figure PCTCN2015083760-appb-000691
Figure PCTCN2015083760-appb-000692
Figure PCTCN2015083760-appb-000693
Then the set of subject elements corresponding to r 1 is:
Figure PCTCN2015083760-appb-000694
在生成宾语元素集合{z1}、{z2}的过程中,运行并列宾语生成算法如下:In the process of generating the set of object elements {z 1 }, {z 2 }, the algorithm for running the parallel object is as follows:
①A(S)取出原句中的全体NPI词组、全体VNP词组、全体OBJP词组,并将原句中的全体NPI词组、全体VNP词组、全体OBJP词组列为一个集合,将该集合记为Ψ={Jack,Mary,Linda,I,my son,a book}={2,3,5,7,9,10}。1A(S) takes out all NPI phrases, all VNP phrases, and all OBJP phrases in the original sentence, and lists all NPI phrases, all VNP phrases, and all OBJP phrases in the original sentence as a set, and records the set as Ψ= {Jack, Mary, Linda, I, my son, a book}={2,3,5,7,9,10}.
②B(Ψ)表示按照
Figure PCTCN2015083760-appb-000695
的方式取集合Ψ={2,3,5,7,9,10}中任意两元素的全部组合,设集合
Figure PCTCN2015083760-appb-000696
Figure PCTCN2015083760-appb-000697
则B(Ψ)={{2,3},{2,5},{2,7},{2,9},{2,10},{3,5},{3,7},{3,9},{3,10},{5,7},{5,9},{5,10},{7,9},{10,7},{10,9}}。
2B (Ψ) means follow
Figure PCTCN2015083760-appb-000695
The way to take all combinations of any two elements in the set Ψ={2,3,5,7,9,10}, set the set
Figure PCTCN2015083760-appb-000696
Figure PCTCN2015083760-appb-000697
Then B(Ψ)={{2,3},{2,5},{2,7},{2,9},{2,10},{3,5},{3,7},{ 3,9},{3,10},{5,7},{5,9},{5,10},{7,9},{10,7},{10,9}}.
③K(α,β)对一元函数B(Ψ)的结果,即对任给的一个
Figure PCTCN2015083760-appb-000698
按照元素
Figure PCTCN2015083760-appb-000699
在原句S中的句法顺序值的从小到大排列。则不妨设
Figure PCTCN2015083760-appb-000700
则可得到有序对
Figure PCTCN2015083760-appb-000701
Figure PCTCN2015083760-appb-000702
生成的有序对是:
The result of 3K(α,β) versus unary function B(Ψ), that is, one given
Figure PCTCN2015083760-appb-000698
By element
Figure PCTCN2015083760-appb-000699
The syntactic order values in the original sentence S are arranged from small to large. You may wish to set
Figure PCTCN2015083760-appb-000700
Orderly pair
Figure PCTCN2015083760-appb-000701
then
Figure PCTCN2015083760-appb-000702
The generated ordered pairs are:
{<2,3>,<2,5>,<2,7>,<2,9>,<2,10>,<3,5>,<3,7>,<3,9>,<3,10>,<5,7>,<5,9>,<5,10>,<7,9>,<7,10>,<9,10>}。{<2,3>,<2,5>,<2,7>,<2,9>,<2,10>,<3,5>,<3,7>,<3,9>,< 3, 10>, <5, 7>, <5, 9>, <5, 10>, <7, 9>, <7, 10>, <9, 10>}.
设集合
Figure PCTCN2015083760-appb-000703
进而建立一个连续词串公 式
Figure PCTCN2015083760-appb-000704
其中
Figure PCTCN2015083760-appb-000705
是原句S中的从
Figure PCTCN2015083760-appb-000706
Figure PCTCN2015083760-appb-000707
的一组相邻的连续词串或空词串,且
Figure PCTCN2015083760-appb-000708
Figure PCTCN2015083760-appb-000709
Figure PCTCN2015083760-appb-000710
则Φ1=2+<e+<3,Φ2=2+<3+<4+<5,Φ3=2+<3+<4+<5+<6+<7,Φ4=2+<3+<4+<5+<6+<7+<8+<9,Φ5=2+<3+<4+<5+<6+<7+<8+<9+<10,Φ6=3+<4+<5,Φ7=3+<4+<5+<6+<7,Φ8=3+<4+<5+<6+<7+<8+<9,Φ9=3+<4+<5+<6+<7+<8+<9+<10,Φ10=5+<6+<7,Φ11=5+<6+<7+<8+<9,Φ12=5+<6+<7+<8+<9+<10,Φ13=7+<8+<9,Φ14=7+<8+<9+<10,Φ15=9+<e+<10。
Set collection
Figure PCTCN2015083760-appb-000703
Then establish a continuous string formula
Figure PCTCN2015083760-appb-000704
among them
Figure PCTCN2015083760-appb-000705
Is the slave in the original sentence S
Figure PCTCN2015083760-appb-000706
To
Figure PCTCN2015083760-appb-000707
a set of adjacent consecutive or empty words, and
Figure PCTCN2015083760-appb-000708
then
Figure PCTCN2015083760-appb-000709
Figure PCTCN2015083760-appb-000710
Then Φ 1 = 2+ < e + < 3, Φ 2 = 2+ < 3 + < 4 + < 5, Φ 3 = 2+ < 3 + < 4 + < 5 + < 6 + < 7, Φ 4 = 2+ <3+<4+<5+<6+<7+<8+<9, Φ 5 =2+<3+<4+<5+<6+<7+<8+<9+<10, Φ 6 =3+<4+<5, Φ 7 =3+<4+<5+<6+<7, Φ 8 =3+<4+<5+<6+<7+<8+<9 , Φ 9 =3+<4+<5+<6+<7+<8+<9+<10, Φ 10 =5+<6+<7, Φ 11 =5+<6+<7+<8+<9,Φ 12 =5+<6+<7+<8+<9+<10, Φ 13 =7+<8+<9, Φ 14 =7+<8+<9+<10, Φ 15 =9+<e+<10.
④H(Φt)对二元函数K(α,β)生成的
Figure PCTCN2015083760-appb-000711
进行检查:如果对任给的元素γ∈Φt,且
Figure PCTCN2015083760-appb-000712
Figure PCTCN2015083760-appb-000713
都有:γ=NPI或γ=VNP或γ=NOMP或γ=CONJ或γ=e,则将Φt的标记改为
Figure PCTCN2015083760-appb-000714
称为Φt生成
Figure PCTCN2015083760-appb-000715
设集合
Figure PCTCN2015083760-appb-000716
则集合
Figure PCTCN2015083760-appb-000717
Figure PCTCN2015083760-appb-000718
4H(Φ t ) generated for the binary function K(α, β)
Figure PCTCN2015083760-appb-000711
Check: if the given element is γ∈Φ t , and
Figure PCTCN2015083760-appb-000712
And
Figure PCTCN2015083760-appb-000713
Both: γ=NPI or γ=VNP or γ=NOMP or γ=CONJ or γ=e, then change the mark of Φ t to
Figure PCTCN2015083760-appb-000714
Φ t generation
Figure PCTCN2015083760-appb-000715
Set collection
Figure PCTCN2015083760-appb-000716
Collection
Figure PCTCN2015083760-appb-000717
then
Figure PCTCN2015083760-appb-000718
⑤M(α,β)表示对于任取的一个集合
Figure PCTCN2015083760-appb-000719
如果集合
Figure PCTCN2015083760-appb-000720
存在对应的
Figure PCTCN2015083760-appb-000721
则定义一个集合族,该集合族由包含集合
Figure PCTCN2015083760-appb-000722
的全体集合构成,该集合族记为
Figure PCTCN2015083760-appb-000723
Figure PCTCN2015083760-appb-000724
Figure PCTCN2015083760-appb-000725
则M(α,β)={I1({2,3}),I2({3,5}),I3({9,10})}。
5M(α,β) represents a set for any
Figure PCTCN2015083760-appb-000719
If collection
Figure PCTCN2015083760-appb-000720
Corresponding
Figure PCTCN2015083760-appb-000721
Defining a collection family that contains collections
Figure PCTCN2015083760-appb-000722
The whole set of the composition, the collection family is recorded as
Figure PCTCN2015083760-appb-000723
then
Figure PCTCN2015083760-appb-000724
Figure PCTCN2015083760-appb-000725
Then M(α,β)={I 1 ({2,3}), I 2 ({3,5}), I 3 ({9,10})}.
⑥N(α,β)对二元函数M(α,β)的结果
Figure PCTCN2015083760-appb-000726
即对于任取集合
Figure PCTCN2015083760-appb-000727
Figure PCTCN2015083760-appb-000728
如果集合
Figure PCTCN2015083760-appb-000729
存在对应的集合旅
Figure PCTCN2015083760-appb-000730
则构造一个新的集合如下
Figure PCTCN2015083760-appb-000731
则可得P[I1({2,3})]={2,3,4,5},P[I2({3,5})]={2,3,4,5},P[I3({9,10})]={9,10}。
6N(α,β) results for the binary function M(α,β)
Figure PCTCN2015083760-appb-000726
That is, for any collection
Figure PCTCN2015083760-appb-000727
Figure PCTCN2015083760-appb-000728
If collection
Figure PCTCN2015083760-appb-000729
There is a corresponding set brigade
Figure PCTCN2015083760-appb-000730
Then construct a new collection as follows
Figure PCTCN2015083760-appb-000731
Then P[I 1 ({2,3})]={2,3,4,5}, P[I 2 ({3,5})]={2,3,4,5}, P [I 3 ({9,10})]={9,10}.
⑦V(β)表示对二元函数N(α,β)的结果
Figure PCTCN2015083760-appb-000732
Figure PCTCN2015083760-appb-000733
Figure PCTCN2015083760-appb-000734
对任给的元素γ,
Figure PCTCN2015083760-appb-000735
都有τ(δ)≤τ(γ)。则Pmin[I1({2,3})]=2,Pmin[I2({3,5})]=2,Pmin[I3({9,10})]=9。
7V(β) represents the result of the binary function N(α, β)
Figure PCTCN2015083760-appb-000732
take
Figure PCTCN2015083760-appb-000733
Assume
Figure PCTCN2015083760-appb-000734
For the given element γ,
Figure PCTCN2015083760-appb-000735
There are τ(δ) ≤ τ(γ). Then P min [I 1 ({2, 3})] = 2, P min [I 2 ({3, 5})] = 2, P min [I 3 ({9, 10})] = 9.
对于r2有{r2}={gave},编号为8。则对应的宾语的选取方法为:当不存在rk+1时:{zk}=NPIzk∪VNPzk∪OBJPk∪Hk∪{e};For r 2 there is {r 2 }={gave}, numbered 8. Then, the corresponding object is selected by: when there is no r k+1 : {z k }=NPI zk ∪VNP zk ∪OBJP k ∪H k ∪{e};
其中:
Figure PCTCN2015083760-appb-000736
among them:
Figure PCTCN2015083760-appb-000736
其中:当不存在rk+1时:
Figure PCTCN2015083760-appb-000737
Figure PCTCN2015083760-appb-000738
Where: when r k+1 does not exist:
Figure PCTCN2015083760-appb-000737
Figure PCTCN2015083760-appb-000738
在上述的公式中,Hk表示最小值编号大于对应的谓语动词单元的全体并列名词代词集合族的并集合。In the above formula, H k represents a union of the total number of collocated noun pronouns of the corresponding predicate verb unit.
则有:
Figure PCTCN2015083760-appb-000739
Then there are:
Figure PCTCN2015083760-appb-000739
则有:
Figure PCTCN2015083760-appb-000740
Then there are:
Figure PCTCN2015083760-appb-000740
Figure PCTCN2015083760-appb-000741
则r2对应的宾语元素的集合是:
Figure PCTCN2015083760-appb-000742
Figure PCTCN2015083760-appb-000743
Figure PCTCN2015083760-appb-000741
Then the set of object elements corresponding to r 2 is:
Figure PCTCN2015083760-appb-000742
Figure PCTCN2015083760-appb-000743
注:在处理的过程中,并列名词代词组合向量作为一个整体来处理;并列名词代词组合向不能被其他句法向量插空;在检查顺序值时,直接将并列名词代词组合向所包含的句法顺序值代入即可。Note: In the process of processing, the parallel noun pronoun combination vector is treated as a whole; the parallel noun pronoun combination can not be inserted into other syntax vectors; when checking the order value, directly combine the parallel noun pronouns into the included syntactic order Values can be substituted.
对于r1有{r1}={left}For r 1 there is {r 1 }={left}
{x1}={After,and,e}(e是空字符串){x 1 }={After,and,e} (e is an empty string)
Figure PCTCN2015083760-appb-000744
Figure PCTCN2015083760-appb-000744
{z1}={f2,e} {z 1 }={f 2 ,e}
运用组合数学中的乘法原理:f1(x1,y1,r1,z1)=(见下方列表)Apply the principle of multiplication in combinatorial mathematics: f 1 (x 1 , y 1 , r 1 , z 1 )= (see list below)
Figure PCTCN2015083760-appb-000745
Figure PCTCN2015083760-appb-000745
Figure PCTCN2015083760-appb-000746
Figure PCTCN2015083760-appb-000746
对于r2有{r2}={gave}For r 2 there is {r 2 }={gave}
{x2}={After,and,e}(e是空字符串){x 2 }={After,and,e} (e is an empty string)
Figure PCTCN2015083760-appb-000747
Figure PCTCN2015083760-appb-000747
Figure PCTCN2015083760-appb-000748
Figure PCTCN2015083760-appb-000748
运用组合数学中的乘法原理:f2(x2,y2,r2,z2)=(见下方列表)Apply the multiplication principle in combinatorial mathematics: f 2 (x 2 , y 2 , r 2 , z 2 )= (see list below)
Figure PCTCN2015083760-appb-000749
Figure PCTCN2015083760-appb-000749
Figure PCTCN2015083760-appb-000750
Figure PCTCN2015083760-appb-000750
Figure PCTCN2015083760-appb-000751
Figure PCTCN2015083760-appb-000751
Figure PCTCN2015083760-appb-000752
Figure PCTCN2015083760-appb-000752
用顺序值替换常量,略。Replace the constant with the sequence value, slightly.
运用组合数学中的乘法原理:Apply the principle of multiplication in combinatorial mathematics:
|S|=|f1|×|f2|=42×108=4536|S|=|f 1 |×|f 2 |=42×108=4536
则总共生成4536个可能矩阵解。A total of 4536 possible matrix solutions are generated.
可得到作为句法结构解析最终结果的可能矩阵解:A possible matrix solution can be obtained as the final result of the parsing of the syntax structure:
Figure PCTCN2015083760-appb-000753
Figure PCTCN2015083760-appb-000753
将上述可能矩阵解进一步还原,得到如下形式:The above possible matrix solution is further reduced to obtain the following form:
Figure PCTCN2015083760-appb-000754
Figure PCTCN2015083760-appb-000754
注:该结果由整体插空方法获得。Note: This result is obtained by the overall insertion method.
将这个矩阵转化为线性表达式:Convert this matrix to a linear expression:
Figure PCTCN2015083760-appb-000755
Figure PCTCN2015083760-appb-000755
去掉e得:Remove e:
Figure PCTCN2015083760-appb-000756
Figure PCTCN2015083760-appb-000756
C5部分 例5Part C5 Example 5
例5:作为另一示例,以下说明本实施例的方法对于例如:“Linda was singing,and Mary was dancing.”这样的并列结构的语句的解析过程。 Example 5: As another example, the parsing process of the method of the present embodiment for a sentence of a parallel structure such as "Linda was singing, and Mary was dancing." will be described below.
上述语句经过预处理除去杂质并编号后的词序列表为:The above statement is preprocessed to remove impurities and the numbered word sequence is:
原句短语Original sentence 短语类型Phrase type 顺序编号Sequence number
LindaLinda 名词代词单元Noun pronoun unit 11
was singingWas singing 谓语动词单元Predicate verb unit 22
andAnd 并列关联词单元Parallel word unit 33
MaryMary 名词代词单元Noun pronoun unit 44
was dancingWas dancing 谓语动词单元Predicate verb unit 55
对于r1有{r1}={was singing}For r 1 there is {r 1 }={was singing}
{x1}={e}(e是空字符串){x 1 }={e} (e is an empty string)
{y1}={Linda,e}{y 1 }={Linda,e}
{z1}={Mary,f2,e}{z 1 }={Mary,f 2 ,e}
运用组合数学中的乘法原理:f1(x1,y1,r1,z1)=(见下方列表)Apply the principle of multiplication in combinatorial mathematics: f 1 (x 1 , y 1 , r 1 , z 1 )= (see list below)
序号Serial number 行矩阵f1 Row matrix f 1
(1-1)(1-1) f1=(e,Linda,r1,Mary)f 1 = (e, Linda, r 1 , Mary)
(1-2)(1-2) f1=(e,e,r1,Mary)f 1 = (e, e, r 1 , Mary)
(1-3)(1-3) f1=(e,Linda,r1,f2)f 1 = (e, Linda, r 1 , f 2 )
(1-4)(1-4) f1=(e,e,r1,f2)f 1 = (e, e, r 1 , f 2 )
(1-5)(1-5) f1=(e,Linda,r1,e)f 1 = (e, Linda, r 1 , e)
(1-6)(1-6) f1=(e,e,r1,e)f 1 = (e, e, r 1 , e)
对于r2有{r2}={was dancing}For r 2 there is {r 2 }={was dancing}
{x2}={and,e}(e是空字符串){x 2 }={and,e} (e is an empty string)
{y2}={Linda,Mary,f1,e}{y 2 }={Linda,Mary,f 1 ,e}
{z2}={e}{z 2 }={e}
运用组合数学中的乘法原理:f2(x2,y2,r2,z2)=(见下方列表) Apply the multiplication principle in combinatorial mathematics: f 2 (x 2 , y 2 , r 2 , z 2 )= (see list below)
序号Serial number 行矩阵f2 Row matrix f 2
(2-1)(2-1) f2=(and,Linda,r2,e)f 2 =(and,Linda,r 2 ,e)
(2-2)(2-2) f2=(and,Mary,r2,e)f 2 =(and,Mary,r 2 ,e)
(2-3)(2-3) f2=(and,f1,r2,e)f 2 =(and,f 1 ,r 2 ,e)
(2-4)(2-4) f2=(and,e,r2,e)f 2 =(and,e,r 2 ,e)
(2-5)(2-5) f2=(e,Linda,r2,e)f 2 = (e, Linda, r 2 , e)
(2-6)(2-6) f2=(e,Mary,r2,e)f 2 =(e,Mary,r 2 ,e)
(2-7)(2-7) f2=(e,f1,r2,e)f 2 =(e,f 1 ,r 2 ,e)
(2-8)(2-8) f2=(e,e,r2,e)f 2 = (e, e, r 2 , e)
用顺序值替换常量,略。Replace the constant with the sequence value, slightly.
运用组合数学中的乘法原理:Apply the principle of multiplication in combinatorial mathematics:
|S|=|f1|×|f2|=6×8=48|S|=|f 1 |×|f 2 |=6×8=48
则总共生成48个可能矩阵解。A total of 48 possible matrix solutions are generated.
将上述可能矩阵解的编号还原为词单元,得到如下形式:Reverting the number of the above possible matrix solution to a word unit yields the following form:
Figure PCTCN2015083760-appb-000757
Figure PCTCN2015083760-appb-000757
将这个矩阵转化为线性表达式:Convert this matrix to a linear expression:
Figure PCTCN2015083760-appb-000758
Figure PCTCN2015083760-appb-000758
去掉e得: Remove e:
Figure PCTCN2015083760-appb-000759
Figure PCTCN2015083760-appb-000759
C6部分 例6Part C6 Example 6
例6:作为另一示例,以下说明本实施例的方法对于例如:“I know that you have a car and that he has a bike.”这样的并列结构的语句的解析过程。Example 6: As another example, the parsing process of the method of the present embodiment for a sentence of a parallel structure such as "I know that you have a car and that he has a bike." will be described below.
上述语句经过预处理除去杂质并编号后的词序列表为:The above statement is preprocessed to remove impurities and the numbered word sequence is:
原句短语Original sentence 短语类型Phrase type 顺序编号Sequence number
II 名词代词单元Noun pronoun unit 11
knowKnow 谓语动词单元Predicate verb unit 22
that AThat A 从属关联词单元Subordinate unit 33
youYou 名词代词单元Noun pronoun unit 44
haveHave 谓语动词单元Predicate verb unit 55
a cara car 名词代词单元Noun pronoun unit 66
andAnd 并列关联词单元Parallel word unit 77
that BThat B 从属关联词单元Subordinate unit 88
heHe 名词代词单元Noun pronoun unit 99
hasHas 谓语动词单元Predicate verb unit 1010
a bikea bike 名词代词单元Noun pronoun unit 1111
记每个谓语动词单元rk对应的引导语元素的集合为:The set of guide elements corresponding to each predicate verb unit r k is:
{xk}=Leadk∪conjk∪(conjkοLeadk)∪{e}{x k }=Lead k ∪conj k ∪(conj k οLead k )∪{e}
记谓语动词单元rk对应的引导语元素为xk,其可能取值集合为{xk}。生成谓语动词单元rk对应的引导语元素为xk的可能取值集合:The leader element corresponding to the verb unit r k is x k , and its possible value set is {x k }. The set of possible elements of the predicate verb unit r k is x k :
{x2}={Lead2}∪{e}={that A,e};{x 2 }={Lead 2 }∪{e}={that A,e};
{x3}=Lead3∪conj3∪(conj3οLead3)∪{e}={that A,and,that B,Ψ,e}。{x 3 }=Lead 3 ∪conj 3 ∪(conj 3 οLead 3 )∪{e}={that A,and,that B,Ψ,e}.
上述两个公式来源于引导语元素的生成算法:{xk}=Leadk∪conjk∪(conjkοLeadk)∪{e}。其中,(conjkοLeadk)={Rk|Rk=conj+<Lead,conj<rk,Lead<rk,τ(Lead)=τ(conj)+1}; (conjkοLeadk)表示由一个编号小于对应的谓语动词单元编号的并列关联词单元和一个与其相邻且编号小于对应的谓语动词单元编号且编号大于该并列关联词单元编号的从属关联词单元构成的关联词组合向量的集合。The above two formulas are derived from the algorithm for generating the leader element: {x k }=Lead k ∪conj k ∪(conj k οLead k )∪{e}. Where (con jk οLead k )={R k |R k =conj+<Lead,conj< r k ,Lead< r k ,τ(Lead)=τ(conj)+1}; (conj k οLead k And a set of related word combination vectors consisting of a parallel related word unit whose number is smaller than the corresponding predicate verb unit number and a dependent related word unit whose number is smaller than the corresponding predicate verb unit number and whose number is greater than the parallel related word unit number.
对于r1有{r1}={know}For r 1 there is {r 1 }={know}
{x1}={e}(e是空字符串){x 1 }={e} (e is an empty string)
{y1}={I,e}{y 1 }={I,e}
{z1}={you,f2,f3,e}{z 1 }={you,f 2 ,f 3 ,e}
运用组合数学中的乘法原理:f1(x1,y1,r1,z1)=(见下方列表)Apply the principle of multiplication in combinatorial mathematics: f 1 (x 1 , y 1 , r 1 , z 1 )= (see list below)
序号Serial number 行矩阵f1 Row matrix f 1 序号Serial number 行矩阵f1 Row matrix f 1
(1-1)(1-1) f1=(e,I,r1,you)f 1 = (e, I, r 1 , you) (1-5)(1-5) f1=(e,I,r1,f3)f 1 = (e, I, r 1 , f 3 )
(1-2)(1-2) f1=(e,e,r1,you)f 1 =(e,e,r 1 ,you) (1-6)(1-6) f1=(e,e,r1,f3)f 1 = (e, e, r 1 , f 3 )
(1-3)(1-3) f1=(e,I,r1,f2)f 1 = (e, I, r 1 , f 2 ) (1-7)(1-7) f1=(e,I,r1,e)f 1 = (e, I, r 1 , e)
(1-4)(1-4) f1=(e,e,r1,f2)f 1 = (e, e, r 1 , f 2 ) (1-8)(1-8) f1=(e,e,r1,e)f 1 = (e, e, r 1 , e)
对于r2有{r2}={have}For r 2 there is {r 2 }={have}
{x2}={that A,e}(e是空字符串){x 2 }={that A,e} (e is an empty string)
{y2}={you,I,f1,e}{y 2 }={you,I,f 1 ,e}
{z2}={a car,f3,e}{z 2 }={a car,f 3 ,e}
运用组合数学中的乘法原理:f2(x2,y2,r2,z2)=(见下方列表)Apply the multiplication principle in combinatorial mathematics: f 2 (x 2 , y 2 , r 2 , z 2 )= (see list below)
序号Serial number 行矩阵f2 Row matrix f 2 序号Serial number 行矩阵f2 Row matrix f 2
(2-1)(2-1) f2=(that A,you,r2,a car)f 2 =(that A,you,r 2 ,a car) (2-13)(2-13) f2=(that A,f1,r2,f3)f 2 =(that A,f 1 ,r 2 ,f 3 )
(2-2)(2-2) f2=(e,you,r2,a car)f 2 =(e,you,r 2 ,a car) (2-14)(2-14) f2=(e,f1,r2,f3)f 2 = (e, f 1 , r 2 , f 3 )
(2-3)(2-3) f2=(that A,I,r2,a car)f 2 =(that A,I,r 2 ,a car) (2-15)(2-15) f2=(that A,e,r2,f3)f 2 =(that A,e,r 2 ,f 3 )
(2-4)(2-4) f2=(e,I,r2,a car)f 2 =(e,I,r 2 ,a car) (2-16)(2-16) f2=(e,e,r2,f3)f 2 = (e, e, r 2 , f 3 )
(2-5)(2-5) f2=(that A,f1,r2,a car)f 2 =(that A,f 1 ,r 2 ,a car) (2-17)(2-17) f2=(that A,you,r2,e)f 2 =(that A,you,r 2 ,e)
(2-6)(2-6) f2=(e,f1,r2,a car)f 2 =(e,f 1 ,r 2 ,a car) (2-18)(2-18) f2=(e,you,r2,e)f 2 =(e,you,r 2 ,e)
(2-7)(2-7) f2=(that A,e,r2,a car)f 2 =(that A,e,r 2 ,a car) (2-19)(2-19) f2=(that A,I,r2,e)f 2 =(that A,I,r 2 ,e)
(2-8)(2-8) f2=(e,e,r2,a car)f 2 =(e,e,r 2 ,a car) (2-20)(2-20) f2=(e,I,r2,e)f 2 = (e, I, r 2 , e)
(2-9)(2-9) f2=(that A,you,r2,f3)f 2 =(that A,you,r 2 ,f 3 ) (2-21)(2-21) f2=(that A,f1,r2,e)f 2 =(that A,f 1 ,r 2 ,e)
(2-10)(2-10) f2=(e,you,r2,f3)f 2 =(e,you,r 2 ,f 3 ) (2-22)(2-22) f2=(e,f1,r2,e)f 2 =(e,f 1 ,r 2 ,e)
(2-11)(2-11) f2=(that A,I,r2,f3)f 2 =(that A,I,r 2 ,f 3 ) (2-23)(2-23) f2=(that A,e,r2,e)f 2 =(that A,e,r 2 ,e)
(2-12)(2-12) f2=(e,I,r2,f3)f 2 = (e, I, r 2 , f 3 ) (2-24)(2-24) f2=(e,e,r2,e)f 2 = (e, e, r 2 , e)
对于r3有{r3}={has}For r 3 there is {r 3 }={has}
{x3}={that A,that B,and,Ψ,e}(e是空字符串){x 3 }={that A,that B,and,Ψ,e} (e is an empty string)
{y3}={you,I,a car,he,f1,f2,e}{y 3 }={you,I,a car,he,f 1 ,f 2 ,e}
{z3}={a bike,e}{z 3 }={a bike,e}
运用组合数学中的乘法原理:f3(x3,y3,r3,z3)=(见下方列表)Apply the multiplication principle in combinatorial mathematics: f 3 (x 3 , y 3 , r 3 , z 3 )= (see list below)
序号Serial number 行矩阵f3 Row matrix f 3 序号Serial number 行矩阵f3 Row matrix f 3
(3-1)(3-1) f3=(that A,you,r3,a bike)f 3 =(that A,you,r 3 ,a bike) (3-36)(3-36) f3=(that A,you,r3,e)f 3 =(that A,you,r 3 ,e)
(3-2)(3-2) f3=(that B,you,r3,a bike)f 3 =(that B,you,r 3 ,a bike) (3-37)(3-37) f3=(that B,you,r3,e)f 3 =(that B,you,r 3 ,e)
(3-3)(3-3) f3=(and,you,r3,a bike)f 3 =(and,you,r 3 ,a bike) (3-38)(3-38) f3=(and,you,r3,e)f 3 =(and,you,r 3 ,e)
(3-4)(3-4) f3=(Ψ,you,r3,a bike)f 3 =(Ψ,you,r 3 ,a bike) (3-39)(3-39) f3=(Ψ,you,r3,e)f 3 =(Ψ,you,r 3 ,e)
(3-5)(3-5) f3=(e,you,r3,a bike)f 3 = (e,you,r 3 ,a bike) (3-40)(3-40) f3=(e,you,r3,e)f 3 = (e,you,r 3 ,e)
(3-6)(3-6) f3=(that A,I,r3,a bike)f 3 =(that A,I,r 3 ,a bike) (3-41)(3-41) f3=(that A,I,r3,e)f 3 =(that A,I,r 3 ,e)
(3-7)(3-7) f3=(that B,I,r3,a bike)f 3 =(that B,I,r 3 ,a bike) (3-42)(3-42) f3=(that B,I,r3,e)f 3 =(that B,I,r 3 ,e)
(3-8)(3-8) f3=(and,I,r3,a bike)f 3 =(and,I,r 3 ,a bike) (3-43)(3-43) f3=(and,I,r3,e)f 3 =(and,I,r 3 ,e)
(3-9)(3-9) f3=(Ψ,I,r3,a bike)f 3 = (Ψ, I, r 3 , a bike) (3-44)(3-44) f3=(Ψ,I,r3,e)f 3 = (Ψ, I, r 3 , e)
(3-10)(3-10) f3=(e,I,r3,a bike)f 3 = (e, I, r 3 , a bike) (3-45)(3-45) f3=(e,I,r3,e)f 3 = (e, I, r 3 , e)
(3-11)(3-11) f3=(that A,a car,r3,a bike)f 3 =(that A,a car,r 3 ,a bike) (3-46)(3-46) f3=(that A,a car,r3,e)f 3 =(that A,a car,r 3 ,e)
(3-12)(3-12) f3=(that B,a car,r3,a bike)f 3 =(that B,a car,r 3 ,a bike) (3-47)(3-47) f3=(that B,a car,r3,e)f 3 =(that B,a car,r 3 ,e)
(3-13)(3-13) f3=(and,a car,r3,a bike)f 3 =(and,a car,r 3 ,a bike) (3-48)(3-48) f3=(and,a car,r3,e)f 3 =(and,a car,r 3 ,e)
(3-14)(3-14) f3=(Ψ,a car,r3,a bike)f 3 = (Ψ, a car, r 3 , a bike) (3-49)(3-49) f3=(Ψ,a car,r3,e)f 3 =(Ψ, a car,r 3 ,e)
(3-15)(3-15) f3=(e,a car,r3,a bike)f 3 =(e,a car,r 3 ,a bike) (3-50)(3-50) f3=(e,a car,r3,e)f 3 = (e, a car, r 3 , e)
(3-16)(3-16) f3=(that A,he,r3,a bike)f 3 =(that A,he,r 3 ,a bike) (3-51)(3-51) f3=(that A,he,r3,e)f 3 =(that A,he,r 3 ,e)
(3-17)(3-17) f3=(that B,he,r3,a bike)f 3 =(that B,he,r 3 ,a bike) (3-52)(3-52) f3=(that B,he,r3,e)f 3 =(that B,he,r 3 ,e)
(3-18)(3-18) f3=(and,he,r3,a bike)f 3 =(and,he,r 3 ,a bike) (3-53)(3-53) f3=(and,he,r3,e)f 3 =(and,he,r 3 ,e)
(3-19)(3-19) f3=(Ψ,he,r3,a bike)f 3 =(Ψ,he,r 3 ,a bike) (3-54)(3-54) f3=(Ψ,he,r3,e)f 3 =(Ψ,he,r 3 ,e)
(3-20)(3-20) f3=(e,he,r3,a bike)f 3 =(e,he,r 3 ,a bike) (3-55)(3-55) f3=(e,he,r3,e)f 3 =(e,he,r 3 ,e)
(3-21)(3-21) f3=(that A,f1,r3,a bike)f 3 =(that A,f 1 ,r 3 ,a bike) (3-56)(3-56) f3=(that A,f1,r3,e)f 3 =(that A,f 1 ,r 3 ,e)
(3-22)(3-22) f3=(that B,f1,r3,a bike)f 3 =(that B,f 1 ,r 3 ,a bike) (3-57)(3-57) f3=(that B,f1,r3,e)f 3 =(that B,f 1 ,r 3 ,e)
(3-23)(3-23) f3=(and,f1,r3,a bike)f 3 =(and,f 1 ,r 3 ,a bike) (3-58)(3-58) f3=(and,f1,r3,e)f 3 =(and,f 1 ,r 3 ,e)
(3-24)(3-24) f3=(Ψ,f1,r3,a bike)f 3 = (Ψ, f 1 , r 3 , a bike) (3-59)(3-59) f3=(Ψ,f1,r3,e)f 3 = (Ψ, f 1 , r 3 , e)
(3-25)(3-25) f3=(e,f1,r3,a bike)f 3 = (e, f 1 , r 3 , a bike) (3-60)(3-60) f3=(e,f1,r3,e)f 3 = (e, f 1 , r 3 , e)
(3-26)(3-26) f3=(that A,f2,r3,a bike)f 3 =(that A,f 2 ,r 3 ,a bike) (3-61)(3-61) f3=(that A,f2,r3,e)f 3 =(that A,f 2 ,r 3 ,e)
(3-27)(3-27) f3=(that B,f2,r3,a bike)f 3 =(that B,f 2 ,r 3 ,a bike) (3-62)(3-62) f3=(that B,f2,r3,e)f 3 =(that B,f 2 ,r 3 ,e)
(3-28)(3-28) f3=(and,f2,r3,a bike)f 3 =(and,f 2 ,r 3 ,a bike) (3-63)(3-63) f3=(and,f2,r3,e)f 3 =(and,f 2 ,r 3 ,e)
(3-29)(3-29) f3=(Ψ,f2,r3,a bike)f 3 = (Ψ, f 2 , r 3 , a bike) (3-64)(3-64) f3=(Ψ,f2,r3,e)f 3 = (Ψ, f 2 , r 3 , e)
(3-30)(3-30) f3=(e,f2,r3,a bike)f 3 = (e, f 2 , r 3 , a bike) (3-65)(3-65) f3=(e,f2,r3,e)f 3 = (e, f 2 , r 3 , e)
(3-31)(3-31) f3=(that A,e,r3,a bike)f 3 =(that A,e,r 3 ,a bike) (3-66)(3-66) f3=(that A,e,r3,e)f 3 =(that A,e,r 3 ,e)
(3-32)(3-32) f3=(that B,e,r3,a bike)f 3 =(that B,e,r 3 ,a bike) (3-67)(3-67) f3=(that B,e,r3,e)f 3 =(that B,e,r 3 ,e)
(3-33)(3-33) f3=(and,e,r3,a bike)f 3 =(and,e,r 3 ,a bike) (3-68)(3-68) f3=(and,e,r3,e)f 3 =(and,e,r 3 ,e)
(3-34)(3-34) f3=(Ψ,e,r3,a bike)f 3 =(Ψ,e,r 3 ,a bike) (3-69)(3-69) f3=(Ψ,e,r3,e)f 3 =(Ψ,e,r 3 ,e)
(3-35)(3-35) f3=(e,e,r3,a bike)f 3 = (e, e, r 3 , a bike) (3-70)(3-70) f3=(e,e,f3,e)f 3 = (e, e, f 3 , e)
用顺序值替换常量,略。Replace the constant with the sequence value, slightly.
运用组合数学中的乘法原理:Apply the principle of multiplication in combinatorial mathematics:
|S|=|f1|×|f2|×|f3|=8×24×70=13440|S|=|f 1 |×|f 2 |×|f 3 |=8×24×70=13440
则总共生成13440个可能矩阵解。A total of 13440 possible matrix solutions are generated.
可得到作为句法结构解析最终结果的可能矩阵解:A possible matrix solution can be obtained as the final result of the parsing of the syntax structure:
Figure PCTCN2015083760-appb-000760
Figure PCTCN2015083760-appb-000760
与这个矩阵表达式相对应的S的线性表达式如下:The linear expression of S corresponding to this matrix expression is as follows:
Figure PCTCN2015083760-appb-000761
Figure PCTCN2015083760-appb-000761
该例句的正确结构是:I know作为主句;that A you have a car是主句的谓语know下辖的第一个宾语从句;and that B hehas a bike是与第一个宾语从句并列的第二个宾语从句;从属关联词单元that A和that B分别引导两个宾语从句;两个宾语从句之间由并列关联词单元and连接;在处理的过程中,关联词组合向量Ψ=and that B作为一个整体来处理;The correct structure of the example sentence is: I know as the main sentence; that A you have a car is the first object clause of the predicate know of the main clause; and that B hehas a bike is juxtaposed with the first object clause Two object clauses; the dependent word units that A and that B respectively guide two object clauses; the two object clauses are connected by the side-by-side unit and the conjunction; in the process of processing, the associated word combination vector Ψ=and that B as a whole To handle
关联词组合向量Ψ不能被其他句法向量插空;在检查顺序值时,直接将关联词组合向量所包含的两个句法顺序值代入即可。最终的结果,是将第二个宾语从句看作以整体插空的方式,插空在第一个宾语从句的末尾处。The associated word combination vector Ψ cannot be inserted into other syntax vectors; when checking the order value, the two syntactic sequence values included in the associated word combination vector can be directly substituted. The end result is that the second object clause is considered to be inserted into the empty space at the end of the first object clause.
图2是本发明的一种基于计算机的自然语言句法结构解析的装置的示意图,所示装置包括:2 is a schematic diagram of an apparatus for analyzing a computer-based natural language syntax structure according to the present invention, the apparatus shown:
读取部件21,用于读取待解析的经预处理的语句数据结构,所述经预处理的语句数据结构中仅包括语句的并列关联词单元、从属关联词单元、谓语动词单元、名词代词单元,且各词单元按照在所述经预处理的语句中的顺序进行编号,并标注类型;The reading component 21 is configured to read the pre-processed statement data structure to be parsed, and the pre-processed statement data structure includes only the parallel-associated word unit, the subordinate-related word unit, the predicate verb unit, and the noun pronoun unit of the statement. And each word unit is numbered in the order in the preprocessed statement, and the type is marked;
元素生成部件22,用于对每一谓语动词单元,生成对应的引导语元素、主语元素、谓语元素和宾语元素;The element generating component 22 is configured to generate, for each predicate verb unit, a corresponding guiding element element, a subject element, a predicate element, and an object element;
其中,所述引导语元素的可能取值为编号小于对应的谓语动词单元编号的并列关联词单元或从属关联词单元之一,或由一个编号小于对应的谓语动词单元编号的并列关联词单元和一个与其相邻且编号小于对应的谓语动词单元编号且编号大于该并列关联词单元编号的从属关联词单元构成的关联词组合向量之一,或空单元;Wherein, the possible value of the guide element is one of a parallel related word unit or a dependent related word unit whose number is smaller than the corresponding predicate verb unit number, or a parallel related word unit whose number is smaller than the corresponding predicate verb unit number and one of them One of the associated word combination vectors formed by the dependent-related word units whose neighbors are smaller than the corresponding predicate verb unit number and whose number is greater than the parallel-related word unit number, or an empty unit;
所述主语元素的可能取值为编号小于对应的谓语动词单元编号的名词代词单元之一,或最大词单元的编号小于对应的谓语动词单元编号的全体并列名词代词组合向量族中所包含的并列名词代词组合向量之一,或在前出现的谓语元素对应的句法向量之一,或空单元;The possible value of the subject element is one of the noun pronoun units whose number is smaller than the corresponding predicate verb unit number, or the number of the largest word unit is smaller than the juxtaposition included in the total parallel noun pronoun combination vector family of the corresponding predicate verb unit number. One of the noun pronoun combination vectors, or one of the syntactic vectors corresponding to the predicate element, or an empty unit;
所述谓语元素为对应的所述谓语动词单元;The predicate element is a corresponding predicate verb unit;
所述宾语元素的可能取值为编号大于对应的谓语动词单元编号且小于相邻的在后出现的谓语动词单元编号的名词代词单元之一,或最小词单元的编号大于对应的谓语动词单元编号且小于相邻的在后出现的谓语动词单元编号的全体并列名词代词组合向量族中所包含的并列名词代词组合向量之一,或在后出现的谓语元素对应的句法向量之一,或空单元;The possible value of the object element is one of the noun pronoun units whose number is greater than the corresponding predicate verb unit number and less than the adjacent predicate verb unit number, or the number of the smallest word unit is greater than the corresponding predicate verb unit number. And one of the parallel noun pronoun combination vectors included in the entire parallel noun pronoun combination vector family of adjacent predicate verb unit numbers, or one of the syntactic vectors corresponding to the predicate element, or an empty unit ;
向量生成部件23,用于根据所述引导语元素、主语元素、谓语元素和宾语元素的可能取值,获取每一谓语动词单元对应的句法向量的所有可能取值,所述句法向量包括引导语元素、主语元素、谓语元素和宾语元素; a vector generating component 23, configured to obtain, according to possible values of the leader element, the subject element, the predicate element, and the object element, all possible values of a syntax vector corresponding to each predicate verb unit, where the syntax vector includes a guide language Elements, subject elements, predicate elements, and object elements;
矩阵生成部件24,用于根据所有句法向量的所有可能取值,生成至少一个句法结构可能矩阵解,所述句法结构可能矩阵解由按照谓语动词单元编号顺序排列的句法向量组成;a matrix generating component 24, configured to generate at least one syntax structure possible matrix solution according to all possible values of all syntax vectors, wherein the syntax structure possible matrix solution consists of a syntax vector arranged according to a predicate verb unit number order;
求解部件25,用于验证根据句法结构可能矩阵解得到的语句是否与所述经预处理的语句完全相同,如果完全相同,则将该句法结构可能矩阵解中的各句法向量作为句法结构解析结果之一;The solving component 25 is configured to verify whether the statement obtained by the possible matrix solution according to the syntax structure is identical to the preprocessed statement. If they are identical, each syntactic vector in the possible matrix solution of the syntactic structure is used as a syntactic structure analysis result. one;
其中,所述求解部件25通过以下模块操作排除不符合条件的句法结构可能解:Wherein, the solving component 25 excludes possible syntactic structure solutions by the following module operations:
第一排除模块,如果存在不在该句法结构可能矩阵解中出现的顺序值,则排除该句法结构可能矩阵解;a first exclusion module, if there is a sequence value that does not appear in the possible matrix solution of the syntax structure, the possible matrix solution is excluded from the syntax structure;
第二排除模块,如果在不同的句法向量中出现相同的顺序值或出现相同的句法向量,则排除该句法结构可能矩阵解;The second exclusion module excludes the possible matrix solution if the same sequence value appears in the different syntax vectors or the same syntax vector appears;
第三排除模块,在每一个可能矩阵解中,将与其他句法向量之间存在相互代入关系的句法向量全都进行等量代换,如果在等量代换之后出现两个句法向量的交叉矛盾,则排除该句法结构可能矩阵解;In the third exclusion module, in each possible matrix solution, the syntactic vectors having mutual substitution relations with other syntax vectors are all equally substituted, and if the cross-contradictions of the two syntax vectors appear after the equal-substitution, Excluding the syntactic structure possible matrix solution;
第四排除模块,在每一个可能矩阵解中,将与其他句法向量之间存在相互代入关系的句法向量全都进行等量代换,如果在等量代换之后出现两个位置逆反的顺序值,则排除该句法结构可能矩阵解;In the fourth exclusion module, in each possible matrix solution, the syntactic vectors having mutual substitution relations with other syntax vectors are all equally substituted, and if the order values of the two positions are reversed after the equal substitution, Excluding the syntactic structure possible matrix solution;
第五排除模块,在任意一个可能矩阵解中,如果存在与其他句法向量之间没有相互代入关系的句法向量,则执行插空操作以获得所有该可能矩阵解所对应的可能句法解析结构,并验证根据所述可能句法解析结构得到的语句是否与所述经预处理的语句完全相同,其进一步包括:a fifth exclusion module, in any one of the possible matrix solutions, if there is a syntax vector that has no substitution relationship with other syntax vectors, performing an interpolation operation to obtain a possible syntax parsing structure corresponding to all the possible matrix solutions, and Verification of whether the statement obtained according to the possible syntax parsing structure is identical to the preprocessed statement, further comprising:
第一子模块,先对该可能矩阵解中相互之间存在代入关系的句法向量进行等量代换,从而将该可能矩阵解转化为一组相互之间不存在代入关系的句法向量
Figure PCTCN2015083760-appb-000762
将可能矩阵解中的句法向量称为第一类句法向量,将转化得到的句法向量
Figure PCTCN2015083760-appb-000763
称为第二类句法向量;
The first sub-module first performs an equal substitution of the syntactic vectors in the possible matrix solutions with the substitution relationship between them, thereby transforming the possible matrix solutions into a set of syntactic vectors without substitution relations between them.
Figure PCTCN2015083760-appb-000762
The syntactic vector in the possible matrix solution is called the first kind of syntactic vector, and the transformed syntactic vector will be transformed.
Figure PCTCN2015083760-appb-000763
Called the second type of syntax vector;
第二子模块、任取一个第二类句法向量
Figure PCTCN2015083760-appb-000764
按照预定的方向逐一标注
Figure PCTCN2015083760-appb-000765
中的每一个句法元素的顺序值;标注句法元素的顺序值之后,任取
Figure PCTCN2015083760-appb-000766
中的第i个句法元素,仅在该句法元素的第一侧构造唯一的空位;造空之后,任取一个句法向量
Figure PCTCN2015083760-appb-000767
以外的第二类句法向量
Figure PCTCN2015083760-appb-000768
以整体插空的方式将句法向量
Figure PCTCN2015083760-appb-000769
插入所构造的空位,进而生成一个新的句法向量,将这个新句法向量记为
Figure PCTCN2015083760-appb-000770
并将整体插空而得到的句法向量,统称为第三类句法向量;
The second sub-module, taking a second type of syntax vector
Figure PCTCN2015083760-appb-000764
Mark one by one according to the predetermined direction
Figure PCTCN2015083760-appb-000765
The order value of each syntax element in the message; after appending the order value of the syntax element, take any
Figure PCTCN2015083760-appb-000766
The i-th syntax element in the construct, only a unique gap is constructed on the first side of the syntax element; after the void, take a syntax vector
Figure PCTCN2015083760-appb-000767
Second type of syntax vector
Figure PCTCN2015083760-appb-000768
Syntactic vector in the form of overall insertion
Figure PCTCN2015083760-appb-000769
Insert the constructed vacancy, and then generate a new syntax vector, record this new syntax vector as
Figure PCTCN2015083760-appb-000770
The syntactic vectors obtained by inserting the whole into space are collectively referred to as the third type of syntax vector;
第三子模块,对第三类句法向量
Figure PCTCN2015083760-appb-000771
按照预定的方向对从向量
Figure PCTCN2015083760-appb-000772
中的第一侧第一个句法元素开始到向量
Figure PCTCN2015083760-appb-000773
中包含的向量
Figure PCTCN2015083760-appb-000774
的第二侧第一个句法元素为止的每一个句法元素,全都标注顺序值;位于向量
Figure PCTCN2015083760-appb-000775
中包含的向量
Figure PCTCN2015083760-appb-000776
第一侧的元素, 不标注顺序值;将向量
Figure PCTCN2015083760-appb-000777
的第二侧的第一个句法元素记为
Figure PCTCN2015083760-appb-000778
将按照前述方式对向量
Figure PCTCN2015083760-appb-000779
标注的句法向量部分,记为甩尾句法向量
Figure PCTCN2015083760-appb-000780
标注顺序值之后,任取一个前述的甩尾向量中的第j个句法元素,仅在该元素第一侧构造唯一的空位;造空之后,任取一个未使用过的第二类句法向量
Figure PCTCN2015083760-appb-000781
以整体插空的方式将句法向量
Figure PCTCN2015083760-appb-000782
插入所构造的空位,进而生成一个新的句法向量,则将新生成的句法向量记为
Figure PCTCN2015083760-appb-000783
或者
Third submodule, the third type of syntax vector
Figure PCTCN2015083760-appb-000771
Pair vector from the predetermined direction
Figure PCTCN2015083760-appb-000772
The first syntactic element on the first side starts into the vector
Figure PCTCN2015083760-appb-000773
Vector contained in
Figure PCTCN2015083760-appb-000774
Each of the syntax elements up to the first syntactic element on the second side, all of which are labeled with a sequence value;
Figure PCTCN2015083760-appb-000775
Vector contained in
Figure PCTCN2015083760-appb-000776
The element on the first side, not the order value; the vector
Figure PCTCN2015083760-appb-000777
The first syntax element on the second side is marked as
Figure PCTCN2015083760-appb-000778
Will be vectored as described above
Figure PCTCN2015083760-appb-000779
The syntactic vector part of the annotation, denoted as the iris syntax vector
Figure PCTCN2015083760-appb-000780
After the order value is marked, take the jth syntax element in the aforementioned tail vector, and construct a unique gap only on the first side of the element; after the empty, take an unused second type of syntax vector
Figure PCTCN2015083760-appb-000781
Syntactic vector in the form of overall insertion
Figure PCTCN2015083760-appb-000782
Insert the constructed vacancy, and then generate a new syntax vector, then record the newly generated syntax vector as
Figure PCTCN2015083760-appb-000783
or
第三类句法向量
Figure PCTCN2015083760-appb-000784
按照预定方向,对句法向量
Figure PCTCN2015083760-appb-000785
中的每一个句法元素全都标注顺序值;标注句法元素的顺序值之后,任取一个
Figure PCTCN2015083760-appb-000786
中的第t个句法元素,在该句法元素的第一侧构造唯一的空位;造空之后,任取一个未使用过得第二类句法向量
Figure PCTCN2015083760-appb-000787
以整体插空的方式将该向量
Figure PCTCN2015083760-appb-000788
插入前面构造的空位,进而生成一个新向量,则该新向量记为
Figure PCTCN2015083760-appb-000789
Third type of syntax vector
Figure PCTCN2015083760-appb-000784
Syntactic vector according to the predetermined direction
Figure PCTCN2015083760-appb-000785
Each syntax element in the label is labeled with a sequential value; after the order value of the syntax element is annotated, take one
Figure PCTCN2015083760-appb-000786
The tth syntax element in the construct, constructing a unique gap on the first side of the syntax element; after the void, taking an unused second type of syntax vector
Figure PCTCN2015083760-appb-000787
The vector is inserted as a whole
Figure PCTCN2015083760-appb-000788
Insert the previously constructed gap and generate a new vector, then the new vector is recorded as
Figure PCTCN2015083760-appb-000789
第四子模块,重复第三子模块的操作,每当上一次造空和插空步骤结束的时候,对经过上一次造空和插空步骤而得到的第三类句法向量进行下一次的造空和插空操作,直至将所有第二类句法向量
Figure PCTCN2015083760-appb-000790
全部插空完毕,最后得到一个单行的第三类句法向量,将所述最后得到的第三类句法向量称为最终单行向量;
The fourth sub-module repeats the operation of the third sub-module, and each time the last nulling and emptying step ends, the third type of syntactic vector obtained through the last emptying and emptying steps is made for the next time. Empty and insert operations until all second type of syntax vectors will be
Figure PCTCN2015083760-appb-000790
After all the insertions are completed, a third type of syntax vector of a single line is finally obtained, and the finally obtained third type of syntax vector is called a final single line vector;
第五子模块,如果一个可能句法解析结构对应的所有所述最终单行向量中均存在两个位置逆反的顺序值,则排除该可能句法解析结构;a fifth submodule, if there are two position reversal order values in all of the final single row vectors corresponding to a possible syntactic parsing structure, the possible syntactic parsing structure is excluded;
第六子模块,重复调用第二子模块至第五子模块的操作直至所有可能句法解析结构被遍历。The sixth sub-module repeatedly calls the operations of the second sub-module to the fifth sub-module until all possible syntactic parsing structures are traversed.
进一步,所述装置还可以包括:Further, the device may further include:
结果显示部件,将句法结构解析结果中的各句法向量以及对应的句法结构关系用树状结构进行在人机交互界面上进行显示。The result display component displays the syntax vector and the corresponding syntax structure relationship in the syntax structure analysis result on the human-computer interaction interface by using a tree structure.
本发明侧重于解决自然语言中的复合式句子结构的准确解析问题。本发明的最大特点在于:①充分利用了复合函数的性质;②采用矩阵模型和线性模型描述句法公式;③运用组合数学的相关原理生成矩阵模型和线性模型。运用本发明,可以提高自然语言句法结构解析的准确率。The present invention focuses on solving the problem of accurate parsing of compound sentence structures in natural language. The most important features of the present invention are: 1 fully utilizing the properties of the composite function; 2 using a matrix model and a linear model to describe the syntactic formula; 3 using the related principles of combinatorial mathematics to generate a matrix model and a linear model. By using the invention, the accuracy of the natural language syntax structure analysis can be improved.
从数学的角度看,自然语言带有离散性特点,而这正是句法结构解析处理上的难点。本发明通过将句法向量与矩阵形式进行有效结合,既没有破坏句子结构的完整性,又不妨碍分析每一句之中的内在成分及词句之间的关系。本发明采用矩阵模型和线性模型刻画句子公式,这既符合自然语言的离散性特点,又有效地揭示了句法结构上的信息关联。From a mathematical point of view, natural language has discrete characteristics, and this is the difficulty in the parsing of syntactic structures. The invention effectively combines the syntactic vector with the matrix form, without destroying the integrity of the sentence structure, and does not hinder the analysis of the intrinsic components and the relationship between the words in each sentence. The invention adopts a matrix model and a linear model to characterize the sentence formula, which not only conforms to the discrete characteristics of natural language, but also effectively reveals the information association on the syntactic structure.
从计算机技术的角度看,本发明采用矩阵模型和线性模型,将单行的自然语言语句转化为分层线性嵌套的形式,从而在很大程度上避免了计算机直接对自然语言的原句标注成分和划分结构 而出现的错乱,进而使计算机的程序任务更加清楚、简洁。本发明所采用的矩阵模型和线性模型,相当于为自然语言的语句画出多条平行的跑道,令自然语言的语句在多条平行的跑道上同时起跑,然后再从中筛选正确结果;也相当于为自然语言的语句提供多个平面,在多个平面上同时处理自然语言的语句,然后再从中筛选正确结果。From the perspective of computer technology, the present invention adopts a matrix model and a linear model to convert a single-line natural language sentence into a hierarchical linear nested form, thereby largely avoiding the computer directly labeling the original sentence of the natural language. And partition structure The resulting confusion makes the computer's program tasks clearer and more concise. The matrix model and the linear model used in the present invention are equivalent to drawing a plurality of parallel runways for natural language sentences, so that the natural language sentences start at the same time on a plurality of parallel runways, and then the correct results are screened therefrom; Provide multiple planes for natural language statements, process natural language statements on multiple planes, and then filter the correct results.
在生成矩阵的过程中,本发明运用了组合数学的相关原理生成全部矩阵,然后再逐一排除,最终获得至少一个可能的正确的句法结构解析结果。在这一过程中,只需要用到数学原理和信息编码,只需要处理实数的数值,每一个步骤最终都落实到查看句法向量的数值是否是升序排列,也就是比较实数的大小,而不涉及英语本身的语言信息。In the process of generating a matrix, the present invention uses the correlation principle of combinatorial mathematics to generate all matrices, and then excludes them one by one, and finally obtains at least one possible correct syntactic structure parsing result. In this process, only the mathematical principle and information coding are needed. Only the values of the real numbers need to be processed. Each step is finally implemented to check whether the value of the syntax vector is in ascending order, that is, the size of the real number is not involved. Language information in English itself.
同时,本发明需要进行大量的数学运算,因此必须借助计算机的计算能力,才能有效实现。At the same time, the present invention requires a large amount of mathematical operations, and therefore must be realized by the computing power of the computer.
综上,本发明依据抽象代数、集合论、组合数学、可计算性理论和计算语言学等学科的数学原理和相应的计算机技术,运用复合函数的数学思想,通过建立矩阵模型和线性模型、构造递归函数来进行自然语言句法结构解析;同时,综合运用数学归纳法等方法对重要的结论进行了证明。In summary, the present invention is based on mathematical principles of abstract algebra, set theory, combinatorial mathematics, computability theory and computational linguistics, and corresponding computer techniques, using mathematical ideas of complex functions, by establishing matrix models and linear models, constructing The recursive function is used to analyze the natural language syntactic structure. At the same time, the important conclusions are proved by the methods of mathematical induction.
本发明构思独特、方法巧妙、论证详实,充分利用了数学和计算机学科的规律,所述方法准确性较高,有较高的技术难度。 The invention has unique concept, ingenious method and detailed argumentation, and fully utilizes the laws of mathematics and computer science, and the method has high accuracy and high technical difficulty.

Claims (8)

  1. 一种基于计算机的自然语言句法结构解析的方法,包括:A computer-based method for parsing natural language syntactic structures, including:
    S1、读取待解析的经预处理的语句数据结构,所述经预处理的语句数据结构中仅包括语句的并列关联词单元、从属关联词单元、谓语动词单元、名词代词单元,且各词单元按照在所述经预处理的语句中的顺序进行编号,并标注类型;S1: reading a pre-processed statement data structure to be parsed, wherein the pre-processed statement data structure includes only a parallel-related word unit, a subordinate-related word unit, a predicate verb unit, a noun pronoun unit, and each word unit is The order in the preprocessed statement is numbered and labeled;
    S2、对每一谓语动词单元,生成对应的引导语元素、主语元素、谓语元素和宾语元素;S2, for each predicate verb unit, generating a corresponding guide element, a subject element, a predicate element, and an object element;
    所述引导语元素的可能取值为编号小于对应的谓语动词单元编号的并列关联词单元或从属关联词单元之一,或由一个编号小于对应的谓语动词单元编号的并列关联词单元和一个与其相邻且编号小于对应的谓语动词单元编号且编号大于该并列关联词单元编号的从属关联词单元构成的关联词组合向量之一,或空单元;The possible value of the guide element is one of a parallel related word unit or a dependent related word unit whose number is smaller than the corresponding predicate verb unit number, or a parallel related word unit whose number is smaller than the corresponding predicate verb unit number and one adjacent thereto One of the associated word combination vectors composed of the dependent word unit whose number is smaller than the corresponding predicate verb unit number and whose number is greater than the parallel related word unit number, or an empty unit;
    所述主语元素的可能取值为编号小于对应的谓语动词单元编号的名词代词单元之一,或最大词单元的编号小于对应的谓语动词单元编号的全体并列名词代词组合向量族中所包含的并列名词代词组合向量之一,或在前出现的谓语元素对应的句法向量之一,或空单元;The possible value of the subject element is one of the noun pronoun units whose number is smaller than the corresponding predicate verb unit number, or the number of the largest word unit is smaller than the juxtaposition included in the total parallel noun pronoun combination vector family of the corresponding predicate verb unit number. One of the noun pronoun combination vectors, or one of the syntactic vectors corresponding to the predicate element, or an empty unit;
    所述谓语元素为对应的所述谓语动词单元;The predicate element is a corresponding predicate verb unit;
    所述宾语元素的可能取值为编号大于对应的谓语动词单元编号且小于相邻的在后出现的谓语动词单元编号的名词代词单元之一,或最小词单元的编号大于对应的谓语动词单元编号且小于相邻的在后出现的谓语动词单元编号的全体并列名词代词组合向量族中所包含的并列名词代词组合向量之一,或在后出现的谓语元素对应的句法向量之一,或空单元;The possible value of the object element is one of the noun pronoun units whose number is greater than the corresponding predicate verb unit number and less than the adjacent predicate verb unit number, or the number of the smallest word unit is greater than the corresponding predicate verb unit number. And one of the parallel noun pronoun combination vectors included in the entire parallel noun pronoun combination vector family of adjacent predicate verb unit numbers, or one of the syntactic vectors corresponding to the predicate element, or an empty unit ;
    S3、根据所述引导语元素、主语元素、谓语元素和宾语元素的可能取值,获取每一谓语动词单元对应的句法向量的所有可能取值,所述句法向量包括引导语元素、主语元素、谓语元素和宾语元素;S3. Obtain all possible values of a syntax vector corresponding to each predicate verb unit according to possible values of the guide element, the subject element, the predicate element, and the object element, where the syntax vector includes a guide element, a subject element, Predicate element and object element;
    S4、根据所有句法向量的所有可能取值,生成至少一个句法结构可能矩阵解,所述句法结构可能矩阵解由按照谓语动词单元编号顺序排列的句法向量组成;S4. Generate at least one syntax structure possible matrix solution according to all possible values of all syntax vectors, where the syntax structure may be composed of syntactic vectors arranged according to the order of the predicate verb unit numbers;
    S5、验证根据句法结构可能矩阵解得到的语句是否与所述经预处理的语句完全相同,如果完全相同,则将该句法结构可能矩阵解中的各句法向量作为句法结构解析结果之一;S5. Verify whether the statement obtained by the possible matrix solution according to the syntax structure is identical to the preprocessed statement. If they are identical, each syntactic vector in the possible matrix solution of the syntax structure is one of the parsing result of the syntax structure;
    其中,S5包括按顺序依次执行以下操作,排除不符合条件的句法结构可能解:Among them, S5 includes the following operations in order, excluding the syntactic structure that does not meet the conditions may be solved:
    S5.1、如果存在不在该句法结构可能矩阵解中出现的顺序值,则排除该句法结构可能矩阵解;S5.1. If there is a sequence value that does not appear in the possible matrix solution of the syntax structure, the possible matrix solution may be excluded from the syntax structure;
    S5.2、如果在不同的句法向量中出现相同的顺序值或出现相同的句法向量,则排除该句法结构可能矩阵解; S5.2. If the same order value appears in different syntax vectors or the same syntax vector appears, the possible syntax solution of the syntax structure is excluded;
    S5.3、在每一个可能矩阵解中,将与其他句法向量之间存在相互代入关系的句法向量全都进行等量代换,如果在等量代换之后出现两个句法向量的交叉矛盾,则排除该句法结构可能矩阵解;S5.3. In each possible matrix solution, the syntactic vectors that are mutually substituted with other syntactic vectors are all equally substituted. If there is a contradiction between two syntactic vectors after the equal substitution, then Excluding the syntactic structure may be a matrix solution;
    S5.4、在每一个可能矩阵解中,将与其他句法向量之间存在相互代入关系的句法向量全都进行等量代换,如果在等量代换之后出现两个位置逆反的顺序值,则排除该句法结构可能矩阵解;S5.4. In each possible matrix solution, all the syntactic vectors that have mutual substitution relationship with other syntactic vectors are equally substituted, if two position reversal order values appear after the equal substitution, then Excluding the syntactic structure may be a matrix solution;
    S5.5、在任意一个可能矩阵解中,如果存在与其他句法向量之间没有相互代入关系的句法向量,则执行插空操作以获得所有该可能矩阵解所对应的可能句法解析结构,并验证根据所述可能句法解析结构得到的语句是否与所述经预处理的语句完全相同,其进一步包括:S5.5. In any of the possible matrix solutions, if there is a syntax vector that has no substitution relationship with other syntax vectors, perform an insertion operation to obtain a possible syntax parsing structure corresponding to all the possible matrix solutions, and verify Whether the statement obtained according to the possible syntax parsing structure is identical to the pre-processed statement, further comprising:
    S5.5.1、先对该可能矩阵解中相互之间存在代入关系的句法向量进行等量代换,从而将该可能矩阵解转化为一组相互之间不存在代入关系的句法向量
    Figure PCTCN2015083760-appb-100001
    将可能矩阵解中的句法向量称为第一类句法向量,将转化得到的句法向量
    Figure PCTCN2015083760-appb-100002
    称为第二类句法向量;
    S5.5.1, firstly perform equal-substitution of the syntactic vectors in the possible matrix solutions with the substitution relationship between them, thereby transforming the possible matrix solutions into a set of syntactic vectors with no substitution relationship between each other.
    Figure PCTCN2015083760-appb-100001
    The syntactic vector in the possible matrix solution is called the first kind of syntactic vector, and the transformed syntactic vector will be transformed.
    Figure PCTCN2015083760-appb-100002
    Called the second type of syntax vector;
    S5.5.2、任取一个第二类句法向量
    Figure PCTCN2015083760-appb-100003
    按照预定的方向逐一标注
    Figure PCTCN2015083760-appb-100004
    中的每一个句法元素的顺序值;标注句法元素的顺序值之后,任取
    Figure PCTCN2015083760-appb-100005
    中的第i个句法元素,仅在该句法元素的第一侧构造唯一的空位;造空之后,任取一个句法向量
    Figure PCTCN2015083760-appb-100006
    以外的第二类句法向量
    Figure PCTCN2015083760-appb-100007
    以整体插空的方式将句法向量
    Figure PCTCN2015083760-appb-100008
    插入所构造的空位,进而生成一个新的句法向量,将这个新句法向量记为
    Figure PCTCN2015083760-appb-100009
    并将整体插空而得到的句法向量,统称为第三类句法向量;
    S5.5.2, take a second type of syntax vector
    Figure PCTCN2015083760-appb-100003
    Mark one by one according to the predetermined direction
    Figure PCTCN2015083760-appb-100004
    The order value of each syntax element in the message; after appending the order value of the syntax element, take any
    Figure PCTCN2015083760-appb-100005
    The i-th syntax element in the construct, only a unique gap is constructed on the first side of the syntax element; after the void, take a syntax vector
    Figure PCTCN2015083760-appb-100006
    Second type of syntax vector
    Figure PCTCN2015083760-appb-100007
    Syntactic vector in the form of overall insertion
    Figure PCTCN2015083760-appb-100008
    Insert the constructed vacancy, and then generate a new syntax vector, record this new syntax vector as
    Figure PCTCN2015083760-appb-100009
    The syntactic vectors obtained by inserting the whole into space are collectively referred to as the third type of syntax vector;
    S5.5.3、对第三类句法向量
    Figure PCTCN2015083760-appb-100010
    按照预定的方向对从向量
    Figure PCTCN2015083760-appb-100011
    中的第一侧第一个句法元素开始到向量
    Figure PCTCN2015083760-appb-100012
    中包含的向量
    Figure PCTCN2015083760-appb-100013
    的第二侧第一个句法元素为止的每一个句法元素,全都标注顺序值;位于向量
    Figure PCTCN2015083760-appb-100014
    中包含的向量
    Figure PCTCN2015083760-appb-100015
    第一侧的元素,不标注顺序值;将向量
    Figure PCTCN2015083760-appb-100016
    的第二侧的第一个句法元素记为
    Figure PCTCN2015083760-appb-100017
    将按照前述方式对向量
    Figure PCTCN2015083760-appb-100018
    标注的句法向量部分,记为甩尾句法向量
    Figure PCTCN2015083760-appb-100019
    标注顺序值之后,任取一个前述的甩尾向量中的第j个句法元素,仅在该元素第一侧构造唯一的空位;造空之后,任取一个未使用过的第二类句法向量
    Figure PCTCN2015083760-appb-100020
    以整体插空的方式将句法向量
    Figure PCTCN2015083760-appb-100021
    插入所构造的空位,进而生成一个新的句法向量,则将新生成的句法向量记为
    Figure PCTCN2015083760-appb-100022
    或者
    S5.5.3, the third type of syntax vector
    Figure PCTCN2015083760-appb-100010
    Pair vector from the predetermined direction
    Figure PCTCN2015083760-appb-100011
    The first syntactic element on the first side starts into the vector
    Figure PCTCN2015083760-appb-100012
    Vector contained in
    Figure PCTCN2015083760-appb-100013
    Each of the syntax elements up to the first syntactic element on the second side, all of which are labeled with a sequence value;
    Figure PCTCN2015083760-appb-100014
    Vector contained in
    Figure PCTCN2015083760-appb-100015
    The element on the first side, without the order value; the vector
    Figure PCTCN2015083760-appb-100016
    The first syntax element on the second side is marked as
    Figure PCTCN2015083760-appb-100017
    Will be vectored as described above
    Figure PCTCN2015083760-appb-100018
    The syntactic vector part of the annotation, denoted as the iris syntax vector
    Figure PCTCN2015083760-appb-100019
    After the order value is marked, take the jth syntax element in the aforementioned tail vector, and construct a unique gap only on the first side of the element; after the empty, take an unused second type of syntax vector
    Figure PCTCN2015083760-appb-100020
    Syntactic vector in the form of overall insertion
    Figure PCTCN2015083760-appb-100021
    Insert the constructed vacancy, and then generate a new syntax vector, then record the newly generated syntax vector as
    Figure PCTCN2015083760-appb-100022
    or
    第三类句法向量
    Figure PCTCN2015083760-appb-100023
    按照预定方向,对句法向量
    Figure PCTCN2015083760-appb-100024
    中的每一个句法元素全都标注顺序值;标注句法元素的顺序值之后,任取一个
    Figure PCTCN2015083760-appb-100025
    中的第t个句法元素,在该句法元素的第一侧构造唯一的空位;造空之后,任取一个未使用过得第二类句法向量
    Figure PCTCN2015083760-appb-100026
    以整体插空的方式将该向量
    Figure PCTCN2015083760-appb-100027
    插入前面构造的空位,进而生成一个新向量,则该新向量记为
    Figure PCTCN2015083760-appb-100028
    Third type of syntax vector
    Figure PCTCN2015083760-appb-100023
    Syntactic vector according to the predetermined direction
    Figure PCTCN2015083760-appb-100024
    Each syntax element in the label is labeled with a sequential value; after the order value of the syntax element is annotated, take one
    Figure PCTCN2015083760-appb-100025
    The tth syntax element in the construct, constructing a unique gap on the first side of the syntax element; after the void, taking an unused second type of syntax vector
    Figure PCTCN2015083760-appb-100026
    The vector is inserted as a whole
    Figure PCTCN2015083760-appb-100027
    Insert the previously constructed gap and generate a new vector, then the new vector is recorded as
    Figure PCTCN2015083760-appb-100028
    S5.5.4、重复执行S5.5.3,每当上一次造空和插空步骤结束的时候,对经过上一次造空 和插空步骤而得到的第三类句法向量进行下一次的造空和插空操作,直至将所有第二类句法向量
    Figure PCTCN2015083760-appb-100029
    全部插空完毕,最后得到一个单行的第三类句法向量,将所述最后得到的第三类句法向量称为最终单行向量;
    S5.5.4, repeated execution of S5.5.3, the next time the emptying and insertion of the third type of syntactic vector obtained through the previous emptying and insertion steps are performed at the end of the last emptying and emptying steps. Null operation until all second type of syntax vectors will be
    Figure PCTCN2015083760-appb-100029
    After all the insertions are completed, a third type of syntax vector of a single line is finally obtained, and the finally obtained third type of syntax vector is called a final single line vector;
    S5.5.5、如果一个可能句法解析结构对应的所有所述最终单行向量中均存在两个位置逆反的顺序值,则排除该可能句法解析结构;S5.5.5, if there are two position reversal order values in all the final single row vectors corresponding to a possible syntax parsing structure, the possible syntactic parsing structure is excluded;
    S5.5.6、重复执行S5.5.2至S5.5.5直至所有可能句法解析结构被遍历。S5.5.6, repeat S5.5.2 to S5.5.5 until all possible syntactic parsing structures are traversed.
  2. 根据权利要求1所述的基于计算机的自然语言句法结构解析的方法,其特征在于,S2包括生成并列名词代词组合向量族:The computer-based natural language syntax structure parsing method according to claim 1, wherein S2 comprises generating a parallel noun pronoun combination vector family:
    S2.1选取不重复的两个名词代词单元:S2.1 selects two noun pronoun units that are not repeated:
    A、如果介于这两个名词代词单元之间没有其他词单元,则将这两个名词代词单元作为一个并列名词代词组合向量,并保留该并列名词代词组合向量;A. If there are no other word units between the two noun pronoun units, the two noun pronoun units are used as a parallel noun pronoun combination vector, and the parallel noun pronoun combination vector is retained;
    B、如果介于这两个名词代词单元之间存在其他词单元,则检查介于这两个名词代词单元之间的每一个词单元:如果介于这两个名词代词单元之间的任意一个词单元,全都是名词代词单元或并列关联词单元,则将所选取的两个名词代词单元和介于这两个名词代词单元之间的全体词单元作为一个并列名词代词组合向量,并保留该并列名词代词组合向量;否则,不生成并列名词代词组合向量;B. If there are other word units between the two noun pronoun units, check each word unit between the two noun pronoun units: if any between the two noun pronoun units Word units, all of which are noun pronoun units or side-by-side related word units, then use the selected two noun pronoun units and the whole word unit between the two noun pronoun units as a parallel noun pronoun combination vector, and retain the juxtaposition Noun pronoun combination vector; otherwise, no parallel noun pronoun combination vector is generated;
    S2.2复执行S2.1直至所有的名词代词单元的组合方式被遍历,生成获得的所有的并列名词代词组合向量;S2.2 complex execution S2.1 until all combinations of noun pronoun units are traversed, and all obtained parallel noun pronoun combination vectors are generated;
    S2.3如果该可能句法解析结构存在并列名词代词组合向量,则对所有的并列名词代词组合向量进行划分,从而形成若干个并列名词代词组合向量族,使得:在每一个并列名词代词组合向量族中,该并列名词代词组合向量族中所包含的每一个并列名词代词组合向量全都包含了两个共同的名词代词单元。S2.3 If there is a parallel noun pronoun combination vector in the possible syntactic parsing structure, all the parallel noun pronoun combination vectors are divided to form a plurality of parallel noun pronoun combination vector families, so that: in each parallel noun pronoun combination vector family Each collocated noun pronoun combination vector included in the parallel noun pronoun combination vector family all contains two common noun pronoun units.
    S2.4在每一个名词代词组合向量族中,选取所有名词代词组合向量中所包含的编号最大的词单元,作为该名词代词组合向量族的最大词单元,以备后续生成主语时使用;选取所有名词代词组合向量中所包含的编号最小的词单元,作为该名词代词组合向量族的最小词单元,以备后续生成宾语时使用。S2.4 selects the largest number of word units contained in all noun pronoun combination vectors in each noun pronoun combination vector family, as the largest word unit of the noun pronoun combination vector family, for use in subsequent generation of the subject; The word unit with the lowest number included in all noun pronoun combination vectors is used as the smallest unit of the noun pronoun combination vector family, and is used for subsequent generation of the object.
  3. 根据权利要求1所述的基于计算机的自然语言句法结构解析方法,其特征在于,生成对应的主语元素包括:The computer-based natural language syntax structure parsing method according to claim 1, wherein the generating the corresponding subject element comprises:
    当对应的谓语动词单元编号是最小的谓语动词单元编号时,所述主语元素的可能取值为编号小于对应的谓语动词单元编号的名词代词单元之一,或其最大词单元的编号小于对应的谓语动 词单元编号的全体并列名词代词组合向量族中所包含的并列名词代词组合向量之一,或空单元。When the corresponding predicate verb unit number is the smallest predicate verb unit number, the possible value of the subject element is one of the noun pronoun units whose number is smaller than the corresponding predicate verb unit number, or the number of the largest word unit is smaller than the corresponding Predicate The unit of the word unit number is one of the parallel noun pronoun combination vectors contained in the vector group of noun pronouns, or an empty unit.
    当对应的谓语动词单元编号不是最小的谓语动词单元编号时,所述主语元素的可能取值为编号小于对应的谓语动词单元编号的名词代词单元之一,或最大词单元的编号小于对应的谓语动词单元编号的全体并列名词代词组合向量族中所包含的并列名词代词组合向量之一,或在前出现的谓语动词单元对应的句法向量之一,或空单元。When the corresponding predicate verb unit number is not the smallest predicate verb unit number, the possible value of the subject element is one of the noun pronoun units whose number is smaller than the corresponding predicate verb unit number, or the number of the largest word unit is smaller than the corresponding predicate The verb unit number is one of the collocated noun pronoun combination vectors contained in the collocation noun pronoun combination vector family, or one of the syntactic vowel units corresponding to the predicate verb unit, or an empty unit.
  4. 根据权利要求1所述的基于计算机的自然语言句法结构解析的方法,其特征在于,生成对应的宾语元素包括:The method for parsing a computer-based natural language syntax structure according to claim 1, wherein the generating the corresponding object element comprises:
    当对应的谓语动词单元编号是最大的谓语动词单元编号时,所述宾语元素的可能取值为编号大于对应的谓语动词单元编号的名词代词单元之一,或其最小词单元的编号大于对应的谓语动词单元编号的全体并列名词代词组合向量族中所包含的并列名词代词组合向量之一,或空单元。When the corresponding predicate verb unit number is the largest predicate verb unit number, the possible value of the object element is one of the noun pronoun units whose number is greater than the corresponding predicate verb unit number, or the number of the smallest word unit is greater than the corresponding number. One of the parallel noun pronoun combination vectors contained in the vector of the predicate verb unit number, or an empty unit.
    当对应的谓语动词单元编号不是最大的谓语动词单元编号时,所述宾语元素的可能取值为编号大于对应的谓语动词单元编号且小于相邻的在后出现的谓语动词单元编号的名词代词单元之一,或其最小词单元的编号大于对应的谓语动词单元编号且小于相邻的在后出现的谓语动词单元编号的全体并列名词代词组合向量族中所包含的并列名词代词组合向量之一,或在后出现的谓语动词单元对应的句法向量之一,或空单元。When the corresponding predicate verb unit number is not the largest predicate verb unit number, the possible value of the object element is a noun pronoun unit whose number is greater than the corresponding predicate verb unit number and is smaller than the adjacent predicate verb unit number. One of the collocated noun pronoun combination vectors included in one of the collocation noun pronoun combination vector numbers, or one of the smallest word units, is greater than the corresponding predicate verb unit number and less than the adjacent predicate verb unit number. Or one of the syntactic vectors corresponding to the predicate verb unit that appears later, or an empty unit.
  5. 根据权利要求1所述的基于计算机的自然语言句法结构解析的方法,其特征在于,在S4和S5两个步骤中,利用与句法结构可能线性表达式解替代所述句法结构可能矩阵解;The method for parsing a computer-based natural language syntax structure according to claim 1, wherein in the two steps S4 and S5, the possible matrix solution is replaced by a possible linear expression solution with a syntax structure;
    所述句法结构可能线性表达式解与所述句法结构可能矩阵解等价;The syntactic structure may be equivalent to a linear expression solution of the syntactic structure;
    所述句法结构可能线性表达式解包括由按照谓语动词单元编号顺序排列的句法向量表达式组成;每个所述句法向量表达式为对应的句法向量的引导语元素、主语元素、谓语元素、宾语元素按照顺序逐项偏加起来的表达式。The syntactic structure may be a linear expression solution comprising a syntactic vector expression arranged in order of predicate verb unit numbers; each of the syntactic vector expressions is a guide element, a subject element, a predicate element, an object of a corresponding syntax vector An expression in which elements are added one by one in order.
  6. 根据权利要求1所述的基于计算机的自然语言句法结构解析的方法,其特征在于,所述方法还包括:The method for parsing a computer-based natural language syntax structure according to claim 1, wherein the method further comprises:
    将句法结构解析结果中的各句法向量以及对应的句法结构关系用树状结构在人机交互界面中进行显示。Each syntax vector and corresponding syntax structure relationship in the syntax structure analysis result are displayed in a human-computer interaction interface by a tree structure.
  7. 一种基于计算机的自然语言句法结构解析的装置,包括:A computer-based natural language syntax structure parsing apparatus, comprising:
    读取部件,用于读取待解析的经预处理的语句数据结构,所述经预处理的语句数据结构中仅包括语句的并列关联词单元、从属关联词单元、谓语动词单元、名词代词单元,且各词单元按照在所述经预处理的语句中的顺序进行编号,并标注类型;a reading component, configured to read a pre-processed statement data structure to be parsed, wherein the pre-processed statement data structure includes only a parallel-related word unit, a subordinate-related word unit, a predicate verb unit, a noun pronoun unit, and Each word unit is numbered in the order in the preprocessed statement, and the type is marked;
    元素生成部件,用于对每一谓语动词单元,生成对应的引导语元素、主语元素、谓语元素 和宾语元素;An element generation component for generating a corresponding guide element, a subject element, and a predicate element for each predicate verb unit And object elements;
    其中,所述引导语元素的可能取值为编号小于对应的谓语动词单元编号的并列关联词单元或从属关联词单元之一,或由一个编号小于对应的谓语动词单元编号的并列关联词单元和一个与其相邻且编号小于对应的谓语动词单元编号且编号大于该并列关联词单元编号的从属关联词单元构成的关联词组合向量之一,或空单元;Wherein, the possible value of the guide element is one of a parallel related word unit or a dependent related word unit whose number is smaller than the corresponding predicate verb unit number, or a parallel related word unit whose number is smaller than the corresponding predicate verb unit number and one of them One of the associated word combination vectors formed by the dependent-related word units whose neighbors are smaller than the corresponding predicate verb unit number and whose number is greater than the parallel-related word unit number, or an empty unit;
    所述主语元素的可能取值为编号小于对应的谓语动词单元编号的名词代词单元之一,或最大词单元的编号小于对应的谓语动词单元编号的全体并列名词代词组合向量族中所包含的并列名词代词组合向量之一,或在前出现的谓语元素对应的句法向量之一,或空单元;The possible value of the subject element is one of the noun pronoun units whose number is smaller than the corresponding predicate verb unit number, or the number of the largest word unit is smaller than the juxtaposition included in the total parallel noun pronoun combination vector family of the corresponding predicate verb unit number. One of the noun pronoun combination vectors, or one of the syntactic vectors corresponding to the predicate element, or an empty unit;
    所述谓语元素为对应的所述谓语动词单元;The predicate element is a corresponding predicate verb unit;
    所述宾语元素的可能取值为编号大于对应的谓语动词单元编号且小于相邻的在后出现的谓语动词单元编号的名词代词单元之一,或最小词单元的编号大于对应的谓语动词单元编号且小于相邻的在后出现的谓语动词单元编号的全体并列名词代词组合向量族中所包含的并列名词代词组合向量之一,或在后出现的谓语元素对应的句法向量之一,或空单元;The possible value of the object element is one of the noun pronoun units whose number is greater than the corresponding predicate verb unit number and less than the adjacent predicate verb unit number, or the number of the smallest word unit is greater than the corresponding predicate verb unit number. And one of the parallel noun pronoun combination vectors included in the entire parallel noun pronoun combination vector family of adjacent predicate verb unit numbers, or one of the syntactic vectors corresponding to the predicate element, or an empty unit ;
    向量生成部件,用于根据所述引导语元素、主语元素、谓语元素和宾语元素的可能取值,获取每一谓语动词单元对应的句法向量的所有可能取值,所述句法向量包括引导语元素、主语元素、谓语元素和宾语元素;a vector generating component, configured to obtain all possible values of a syntax vector corresponding to each predicate verb unit according to possible values of the guide element, the subject element, the predicate element, and the object element, where the syntax vector includes a guide element , subject elements, predicate elements, and object elements;
    矩阵生成部件,用于根据所有句法向量的所有可能取值,生成至少一个句法结构可能矩阵解,所述句法结构可能矩阵解由按照谓语动词单元编号顺序排列的句法向量组成;a matrix generating component, configured to generate at least one syntax structure possible matrix solution according to all possible values of all syntax vectors, wherein the syntax structure may be composed of a syntax vector arranged according to a predicate verb unit number order;
    求解部件,用于验证根据句法结构可能矩阵解得到的语句是否与所述经预处理的语句完全相同,如果完全相同,则将该句法结构可能矩阵解中的各句法向量作为句法结构解析结果之一;a solution component for verifying whether the statement obtained by the possible matrix solution according to the syntax structure is identical to the preprocessed statement, and if they are identical, each syntactic vector in the possible matrix solution of the syntax structure is used as a syntactic structure analysis result One;
    其中,所述求解部件通过以下模块操作排除不符合条件的句法结构可能解:Wherein, the solving component excludes a possible syntactic structure solution by the following module operation:
    第一排除模块,如果存在不在该句法结构可能矩阵解中出现的顺序值,则排除该句法结构可能矩阵解;a first exclusion module, if there is a sequence value that does not appear in the possible matrix solution of the syntax structure, the possible matrix solution is excluded from the syntax structure;
    第二排除模块,如果在不同的句法向量中出现相同的顺序值或出现相同的句法向量,则排除该句法结构可能矩阵解;The second exclusion module excludes the possible matrix solution if the same sequence value appears in the different syntax vectors or the same syntax vector appears;
    第三排除模块,在每一个可能矩阵解中,将与其他句法向量之间存在相互代入关系的句法向量全都进行等量代换,如果在等量代换之后出现两个句法向量的交叉矛盾,则排除该句法结构可能矩阵解;In the third exclusion module, in each possible matrix solution, the syntactic vectors having mutual substitution relations with other syntax vectors are all equally substituted, and if the cross-contradictions of the two syntax vectors appear after the equal-substitution, Excluding the syntactic structure possible matrix solution;
    第四排除模块,在每一个可能矩阵解中,将与其他句法向量之间存在相互代入关系的句法向量全都进行等量代换,如果在等量代换之后出现两个位置逆反的顺序值,则排除该句法结构可 能矩阵解;In the fourth exclusion module, in each possible matrix solution, the syntactic vectors having mutual substitution relations with other syntax vectors are all equally substituted, and if the order values of the two positions are reversed after the equal substitution, Exclude the syntactic structure Energy matrix solution
    第五排除模块,在任意一个可能矩阵解中,如果存在与其他句法向量之间没有相互代入关系的句法向量,则执行插空操作以获得所有该可能矩阵解所对应的可能句法解析结构,并验证根据所述可能句法解析结构得到的语句是否与所述经预处理的语句完全相同,其进一步包括:a fifth exclusion module, in any one of the possible matrix solutions, if there is a syntax vector that has no substitution relationship with other syntax vectors, performing an interpolation operation to obtain a possible syntax parsing structure corresponding to all the possible matrix solutions, and Verification of whether the statement obtained according to the possible syntax parsing structure is identical to the preprocessed statement, further comprising:
    第一子模块,先对该可能矩阵解中相互之间存在代入关系的句法向量进行等量代换,从而将该可能矩阵解转化为一组相互之间不存在代入关系的句法向量
    Figure PCTCN2015083760-appb-100030
    将可能矩阵解中的句法向量称为第一类句法向量,将转化得到的句法向量
    Figure PCTCN2015083760-appb-100031
    称为第二类句法向量;
    The first sub-module first performs an equal substitution of the syntactic vectors in the possible matrix solutions with the substitution relationship between them, thereby transforming the possible matrix solutions into a set of syntactic vectors without substitution relations between them.
    Figure PCTCN2015083760-appb-100030
    The syntactic vector in the possible matrix solution is called the first kind of syntactic vector, and the transformed syntactic vector will be transformed.
    Figure PCTCN2015083760-appb-100031
    Called the second type of syntax vector;
    第二子模块、任取一个第二类句法向量
    Figure PCTCN2015083760-appb-100032
    按照预定的方向逐一标注
    Figure PCTCN2015083760-appb-100033
    中的每一个句法元素的顺序值;标注句法元素的顺序值之后,任取
    Figure PCTCN2015083760-appb-100034
    中的第i个句法元素,仅在该句法元素的第一侧构造唯一的空位;造空之后,任取一个句法向量
    Figure PCTCN2015083760-appb-100035
    以外的第二类句法向量
    Figure PCTCN2015083760-appb-100036
    以整体插空的方式将句法向量
    Figure PCTCN2015083760-appb-100037
    插入所构造的空位,进而生成一个新的句法向量,将这个新句法向量记为
    Figure PCTCN2015083760-appb-100038
    并将整体插空而得到的句法向量,统称为第三类句法向量;
    The second sub-module, taking a second type of syntax vector
    Figure PCTCN2015083760-appb-100032
    Mark one by one according to the predetermined direction
    Figure PCTCN2015083760-appb-100033
    The order value of each syntax element in the message; after appending the order value of the syntax element, take any
    Figure PCTCN2015083760-appb-100034
    The i-th syntax element in the construct, only a unique gap is constructed on the first side of the syntax element; after the void, take a syntax vector
    Figure PCTCN2015083760-appb-100035
    Second type of syntax vector
    Figure PCTCN2015083760-appb-100036
    Syntactic vector in the form of overall insertion
    Figure PCTCN2015083760-appb-100037
    Insert the constructed vacancy, and then generate a new syntax vector, record this new syntax vector as
    Figure PCTCN2015083760-appb-100038
    The syntactic vectors obtained by inserting the whole into space are collectively referred to as the third type of syntax vector;
    第三子模块,对第三类句法向量
    Figure PCTCN2015083760-appb-100039
    按照预定的方向对从向量
    Figure PCTCN2015083760-appb-100040
    中的第一侧第一个句法元素开始到向量
    Figure PCTCN2015083760-appb-100041
    中包含的向量
    Figure PCTCN2015083760-appb-100042
    的第二侧第一个句法元素为止的每一个句法元素,全都标注顺序值;位于向量
    Figure PCTCN2015083760-appb-100043
    中包含的向量
    Figure PCTCN2015083760-appb-100044
    第一侧的元素,不标注顺序值;将向量
    Figure PCTCN2015083760-appb-100045
    的第二侧的第一个句法元素记为
    Figure PCTCN2015083760-appb-100046
    将按照前述方式对向量
    Figure PCTCN2015083760-appb-100047
    标注的句法向量部分,记为甩尾句法向量
    Figure PCTCN2015083760-appb-100048
    标注顺序值之后,任取一个前述的甩尾向量中的第j个句法元素,仅在该元素第一侧构造唯一的空位;造空之后,任取一个未使用过的第二类句法向量
    Figure PCTCN2015083760-appb-100049
    以整体插空的方式将句法向量
    Figure PCTCN2015083760-appb-100050
    插入所构造的空位,进而生成一个新的句法向量,则将新生成的句法向量记为
    Figure PCTCN2015083760-appb-100051
    或者
    Third submodule, the third type of syntax vector
    Figure PCTCN2015083760-appb-100039
    Pair vector from the predetermined direction
    Figure PCTCN2015083760-appb-100040
    The first syntactic element on the first side starts into the vector
    Figure PCTCN2015083760-appb-100041
    Vector contained in
    Figure PCTCN2015083760-appb-100042
    Each of the syntax elements up to the first syntactic element on the second side, all of which are labeled with a sequence value;
    Figure PCTCN2015083760-appb-100043
    Vector contained in
    Figure PCTCN2015083760-appb-100044
    The element on the first side, without the order value; the vector
    Figure PCTCN2015083760-appb-100045
    The first syntax element on the second side is marked as
    Figure PCTCN2015083760-appb-100046
    Will be vectored as described above
    Figure PCTCN2015083760-appb-100047
    The syntactic vector part of the annotation, denoted as the iris syntax vector
    Figure PCTCN2015083760-appb-100048
    After the order value is marked, take the jth syntax element in the aforementioned tail vector, and construct a unique gap only on the first side of the element; after the empty, take an unused second type of syntax vector
    Figure PCTCN2015083760-appb-100049
    Syntactic vector in the form of overall insertion
    Figure PCTCN2015083760-appb-100050
    Insert the constructed vacancy, and then generate a new syntax vector, then record the newly generated syntax vector as
    Figure PCTCN2015083760-appb-100051
    or
    第三类句法向量
    Figure PCTCN2015083760-appb-100052
    按照预定方向,对句法向量
    Figure PCTCN2015083760-appb-100053
    中的每一个句法元素全都标注顺序值;标注句法元素的顺序值之后,任取一个
    Figure PCTCN2015083760-appb-100054
    中的第t个句法元素,在该句法元素的第一侧构造唯一的空位;造空之后,任取一个未使用过得第二类句法向量
    Figure PCTCN2015083760-appb-100055
    以整体插空的方式将该向量
    Figure PCTCN2015083760-appb-100056
    插入前面构造的空位,进而生成一个新向量,则该新向量记为
    Figure PCTCN2015083760-appb-100057
    Third type of syntax vector
    Figure PCTCN2015083760-appb-100052
    Syntactic vector according to the predetermined direction
    Figure PCTCN2015083760-appb-100053
    Each syntax element in the label is labeled with a sequential value; after the order value of the syntax element is annotated, take one
    Figure PCTCN2015083760-appb-100054
    The tth syntax element in the construct, constructing a unique gap on the first side of the syntax element; after the void, taking an unused second type of syntax vector
    Figure PCTCN2015083760-appb-100055
    The vector is inserted as a whole
    Figure PCTCN2015083760-appb-100056
    Insert the previously constructed gap and generate a new vector, then the new vector is recorded as
    Figure PCTCN2015083760-appb-100057
    第四子模块,重复第三子模块的操作,每当上一次造空和插空步骤结束的时候,对经过上一次造空和插空步骤而得到的第三类句法向量进行下一次的造空和插空操作,直至将所有第二类句法向量
    Figure PCTCN2015083760-appb-100058
    全部插空完毕,最后得到一个单行的第三类句法向量,将所述最后得到的第三类句法向量称为最终单行向量;
    The fourth sub-module repeats the operation of the third sub-module, and each time the last nulling and emptying step ends, the third type of syntactic vector obtained through the last emptying and emptying steps is made for the next time. Empty and insert operations until all second type of syntax vectors will be
    Figure PCTCN2015083760-appb-100058
    After all the insertions are completed, a third type of syntax vector of a single line is finally obtained, and the finally obtained third type of syntax vector is called a final single line vector;
    第五子模块,如果一个可能句法解析结构对应的所有所述最终单行向量中均存在两个位置逆反的顺序值,则排除该可能句法解析结构;a fifth submodule, if there are two position reversal order values in all of the final single row vectors corresponding to a possible syntactic parsing structure, the possible syntactic parsing structure is excluded;
    第六子模块,重复调用第二子模块至第五子模块的操作直至所有可能句法解析结构被遍历。The sixth sub-module repeatedly calls the operations of the second sub-module to the fifth sub-module until all possible syntactic parsing structures are traversed.
  8. 根据权利要求7所述的基于计算机的自然语言句法结构解析的装置,其特征在于,还包括:The apparatus for parsing a computer-based natural language syntax structure according to claim 7, further comprising:
    结果显示部件,将句法结构解析结果中的各句法向量以及对应的句法结构关系用树状结构进行在人机交互界面上进行显示。 The result display component displays the syntax vector and the corresponding syntax structure relationship in the syntax structure analysis result on the human-computer interaction interface by using a tree structure.
PCT/CN2015/083760 2014-08-22 2015-07-10 Computer-based method and device for parsing natural language syntactic structures WO2016026359A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN2014104196340 2014-08-22
CN201410419634.0A CN104156353B (en) 2014-08-22 2014-08-22 A kind of method and apparatus of computer based natural language syntactic structure parsing

Publications (1)

Publication Number Publication Date
WO2016026359A1 true WO2016026359A1 (en) 2016-02-25

Family

ID=51881858

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2015/083760 WO2016026359A1 (en) 2014-08-22 2015-07-10 Computer-based method and device for parsing natural language syntactic structures

Country Status (2)

Country Link
CN (1) CN104156353B (en)
WO (1) WO2016026359A1 (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104156353B (en) * 2014-08-22 2017-10-31 秦一男 A kind of method and apparatus of computer based natural language syntactic structure parsing
CN107422691B (en) * 2017-08-11 2020-05-12 山东省计算中心(国家超级计算济南中心) Collaborative PLC programming language construction method
CN108009234B (en) * 2017-11-29 2022-02-11 苏州大学 Extraction method, device and equipment of non-entity type argument
CN111666405B (en) * 2019-03-06 2023-07-07 百度在线网络技术(北京)有限公司 Method and device for identifying text implication relationship
CN110020434B (en) * 2019-03-22 2021-02-12 北京语自成科技有限公司 Natural language syntactic analysis method

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040015342A1 (en) * 2002-02-15 2004-01-22 Garst Peter F. Linguistic support for a recognizer of mathematical expressions
CN102945230A (en) * 2012-10-17 2013-02-27 刘运通 Natural language knowledge acquisition method based on semantic matching driving
CN103927298A (en) * 2014-04-25 2014-07-16 秦一男 Natural language syntactic structure analyzing method and device based on computer
CN104156353A (en) * 2014-08-22 2014-11-19 秦一男 Computer-based method and device for analyzing natural language syntactic structures

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101075929B (en) * 2007-03-02 2010-11-24 腾讯科技(深圳)有限公司 Method, system and server for inquiring information

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040015342A1 (en) * 2002-02-15 2004-01-22 Garst Peter F. Linguistic support for a recognizer of mathematical expressions
CN102945230A (en) * 2012-10-17 2013-02-27 刘运通 Natural language knowledge acquisition method based on semantic matching driving
CN103927298A (en) * 2014-04-25 2014-07-16 秦一男 Natural language syntactic structure analyzing method and device based on computer
CN104156353A (en) * 2014-08-22 2014-11-19 秦一男 Computer-based method and device for analyzing natural language syntactic structures

Also Published As

Publication number Publication date
CN104156353B (en) 2017-10-31
CN104156353A (en) 2014-11-19

Similar Documents

Publication Publication Date Title
Täckström et al. Efficient inference and structured learning for semantic role labeling
WO2016026359A1 (en) Computer-based method and device for parsing natural language syntactic structures
McDonald et al. Discriminative learning and spanning tree algorithms for dependency parsing
Henderson et al. Multilingual joint parsing of syntactic and semantic dependencies with a latent variable model
US20140278362A1 (en) Entity Recognition in Natural Language Processing Systems
Gómez-Rodríguez et al. A polynomial-time dynamic oracle for non-projective dependency parsing
Kuhlmann Mildly non-projective dependency grammar
US9720903B2 (en) Method for parsing natural language text with simple links
Koller Semantic construction with graph grammars
Luo A maximum entropy Chinese character-based parser
CN102646091A (en) Dependence relationship labeling method, device and system
Sartorio et al. A transition-based dependency parser using a dynamic parsing strategy
CN103927298A (en) Natural language syntactic structure analyzing method and device based on computer
Tu et al. Dependency forest for sentiment analysis
US20170286394A1 (en) Method for parsing natural language text with constituent construction links
Ma et al. Probabilistic models for high-order projective dependency parsing
Kuboň Problems of robust parsing of Czech
Santos et al. Extraction of family relations between entities
Clark An introduction to multiple context free grammars for linguists
Wang et al. Syntactic role identification of mathematical expressions
Horvat Hierarchical statistical semantic translation and realization
Kilpeläinen et al. SGML and XML document grammars and exceptions
Welch et al. World knowledge for abstract meaning representation parsing
Büchse et al. Tree parsing for tree-adjoining machine translation
Luo et al. Dissecting the ambiguity of fma concept names using taxonomy and partonomy structural information

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15834546

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15834546

Country of ref document: EP

Kind code of ref document: A1