CN100555270C

CN100555270C - A kind of machine automatic testing method and system thereof

Info

Publication number: CN100555270C
Application number: CNB2004100006288A
Authority: CN
Inventors: 刘群; 刘洋
Original assignee: Institute of Computing Technology of CAS
Current assignee: Huawei Technologies Co Ltd
Priority date: 2004-01-13
Filing date: 2004-01-13
Publication date: 2009-10-28
Anticipated expiration: 2024-01-13
Also published as: CN1641631A

Abstract

The invention discloses a kind of machine automatic testing method and system thereof.This method is searched the sub-segment of coupling of the output translation of machine translation system at least one piece of reference translation, mate sub-segment and length computation entropy thereof according to these then, and computational length penalty coefficient and matching ratio penalty coefficient, obtain a scoring at last as evaluation index.System for automatically evaluating machine translation of the present invention comprises sub-segment search module, length computation module, length punishment module, matching ratio punishment module and evaluation and test score computing module.Machine automatic testing method of the present invention and system thereof can accurately estimate translation quality, and can Cheng Shengyi have the evaluation index of absolute sense, and do not limit the length of the sub-segment of coupling when evaluating and testing automatically.

Description

A kind of machine automatic testing method and system thereof

Technical field

The present invention relates to natural language processing field, more particularly, the present invention relates to a kind of machine automatic testing method.

Background technology

(Machine Translation is to utilize computing machine that a kind of natural language is transformed into the process of another kind of natural language MT) to mechanical translation, is called machine translation system in order to the software of finishing this process.Evaluation and test to machine translation system translation quality is whole natural language field core and crucial part.In recent years, in the natural language research field, the evaluation and test problem more and more was subjected to paying attention to widely.MT evaluation is significant for research on the machine translation and development: the developer of machine translation system can learn that the problem that system exists updates by evaluation and test; System integration person can select the good system integration of performance among software platform according to evaluation result; The user also can select to satisfy the product of own demand according to report of accessment and test; And for the research on the machine translation personnel, evaluation and test provides the most reliable foundation can for their technological development direction.

At present, MT evaluation adopts artificial evaluation and test or evaluation and test automatically usually.Machine translation system is is manually evaluated and tested of a high price, need cost great amount of manpower and time, and artificial evaluation and test is had the shortcoming that is difficult to keep consistency.The automatically evaluating machine translation technology is estimated machine translation system automatically by the software that use is stored in the computing machine, have advantages such as rapid, that feedback is timely, and can guarantee evaluation consistency, have great importance for research on the machine translation and application.

Current machine automatic testing method all is based on the output translation of machine translation system and the string matching between the preassigned reference translation, just searches the character string that occurs in the output translation in reference translation.Having a variety ofly for the processing mode of the character string that matches, is main method in the present automatically evaluating machine translation technology based on the method for N-gram co-occurrence.Method based on the N-gram co-occurrence mainly contains two kinds: BLEU and NIST.BLEU usually is zero for the scoring of single sentence, and discrimination is relatively poor.The advantage of NIST is to have considered quantity of information, and the information granularity of speech is had discrimination, and shortcoming is that the score value that generates only has relative meaning, can not pass judgment on the quality of translation quality utterly.While BLEU and NIST have limited each and have mated the maximum length (latest edition is 9) of sub-segment.

Summary of the invention

The purpose of this invention is to provide a kind of new machine automatic testing method, this method is used the evaluation metrics of entropy as machine translation system output translation, and this method can produce an evaluation result with absolute sense.

In order to realize the foregoing invention purpose, machine automatic testing method provided by the invention comprises the steps:

1) search the sub-segment of coupling of described output translation at least one piece of reference translation, the sub-segment of described coupling is the matched character string of Discrete Distribution in described output translation; The length of mating sub-segment is l _i, i=1～n, wherein n is the number of the sub-segment of coupling in the described output translation;

2) calculate entropy according to the sub-segment of coupling

H = - Σ_{i = 1}^{n} \frac{l_{i}}{L} \log_{2} (\frac{l_{i}}{L}),

Wherein

L = Σ_{i = 1}^{n} l_{i}

It is the total length of the sub-fragment of coupling; Described entropy H is as the evaluation index of described output translation;

3) computational length penalty coefficient

LP = 2^{| \frac{L_{can}}{L_{\overset{&OverBar;}{ref}}} - 1 |},

Wherein, L _CanBe the length of described output translation, L _RefIt is the average length of described at least one piece of reference translation;

4) calculate the matching ratio penalty coefficient

r = \frac{L}{L_{can}};

5) calculate evaluation and test score segScore=a according to entropy H, length penalty coefficient LP and matching ratio penalty coefficient r ^-H*LP* r, wherein a is the parameter of value in 1.1～1.5 scopes, preferred value is 1.2; Described evaluation and test score segScore is as the evaluation index of described output translation.

The present invention also provides a kind of system for automatically evaluating machine translation, is used for the output translation of machine translation system and at least one piece of reference translation of setting are handled, and this system comprises a computing machine, and described computing machine comprises:

Sub-segment search module, the sub-segment of coupling that is used in described at least one piece of reference translation, searching for described output translation;

The length computation module is used to calculate the length of the sub-segment of coupling that sub-segment search module searches out, and calculates the total length that all mate sub-segment; Wherein, the length of mating sub-segment is l _i, i=1～n, wherein n is the number of the sub-segment of coupling in the described output translation;

L = Σ_{i = 1}^{n} l_{i}

It is the total length of the sub-fragment of coupling;

The entropy module is used for calculating entropy according to the sub-segment of coupling

H = - Σ_{i = 1}^{n} \frac{l_{i}}{L} \log_{2} (\frac{l_{i}}{L});

Length punishment module is used for the computational length penalty coefficient

LP = 2^{| \frac{L_{can}}{L_{\overset{&OverBar;}{ref}}} - 1 |},

Matching ratio punishment module is used to calculate the matching ratio penalty coefficient

r = \frac{L}{L_{can}};

Evaluation and test score computing module is used for calculating evaluation and test score: segScore=a according to entropy H, length penalty coefficient LP and matching ratio penalty coefficient r ^-H*LP* r, wherein a is a value in 1.1～1.5 scopes, preferred value is 1.2.

Machine automatic testing method of the present invention and system characterize the distribution situation of mating sub-segment with entropy, and a scoring after directly adopting entropy or adopting the introducing punitive measures is as the index of evaluation system output translation quality.Because entropy is the probabilistic tolerance of things as a statistic itself, when using it for automatically evaluating machine translation, can reflect the distribution situation of mating sub-segment.Entropy is low more, illustrates that the distribution of the sub-segment of coupling is concentrated more, and the matching degree of translation and reference translation is just high more, and translation quality is just good more; Entropy is high more, and overstepping the bounds of propriety the loosing of distribution of the sub-segment of coupling is described, the matching degree of translation and reference translation is low more, and translation quality is just poor more.So,, can reflect the quality of exporting translation preferably with the evaluation index of entropy as translation quality.The present invention also introduces suitable punitive measures at entropy, to avoid system's output translation long or too short, perhaps matching ratio what and cause evaluating and testing problem improperly.Further, when a scoring after adopting entropy or adopting the introducing punitive measures was exported the index of translation quality as evaluation system, its index had absolute sense.At last, the present invention does not limit the length l that each mates sub-segment when calculating entropy _i, overcome the shortcoming that existing method based on the N-gram co-occurrence need limit the sub-fragment length of coupling, so it is used more flexibly with extensive.

Description of drawings

Fig. 1 is the process flow diagram of machine automatic testing method of the present invention;

Fig. 2 is a coupling segment and the synoptic diagram that mates sub-segment in the one embodiment of the invention.

Embodiment

Below in conjunction with the drawings and specific embodiments the present invention is described in further detail.

For a system for automatically evaluating machine translation, normally realize with a computing machine that has specific program.For same piece of writing article to be translated, by machine translation system this article is finished translation and output translation, the Professional translator also finishes the translation conduct with reference to translation to this article.The reference translation that output translation that machine translation system provides and translator provide all is stored in the aforementioned computing machine that has a program, and the program of this computing machine is finished the evaluation and test to the output translation of machine translation system.In the present invention, the process flow diagram of computer program is as shown in Figure 1 in the system for automatically evaluating machine translation.

Be similar to the machine automatic testing method of great majority based on coupling, for example based on the N-gram co-occurrence with based on the method for calculating harmonic-mean, method for automatically evaluating of the present invention also at first the output translation of analytic engine translation system with respect to the coupling of reference translation.As shown in Figure 1, in the present invention, at first in step 10, search the sub-segment of coupling.In the present invention, coupling segment and the sub-segment of coupling are meant the character string that the output translation matches with respect to reference translation.In the ordinary course of things, system's output translation and reference translation are not identical, therefore these character strings that match inevitable Discrete Distribution in system's output translation, the middle character string that is not matched separates, and the matched character string of these Discrete Distribution is referred to as to mate segment.

Though each speech of coupling segment inside is preceding latter linked on the position, but for the situation that a plurality of reference translations are arranged, the coupling segment may not be to arrive as a whole matching, but some is to match in a reference translation, and a part is to match in another reference translation in addition.To be referred to as to mate sub-segment in the character string that matches in single reference translation under the situation of many reference translations, the character string that matches in a plurality of reference translations is referred to as to mate segment.Sub-segment in the coupling segment preferentially keeps the longest sub-segment of length.If two sub-segments have the overlapping part, then preferentially keep than eldest son's segment, another is then clipped the overlapping partial-length.So just guaranteed the part that do not overlap between the sub-segment.

Implication for coupling segment and the sub-segment of coupling is appreciated that clearlyer in the example of Fig. 2.As shown in Figure 2, the four lines on the arrow is represented system's output translation and four situations that reference translation is mated respectively.The speech that matches is represented in the space of band stain among the figure, and does not represent the speech that do not match with the space of stain.Line display output translation and four match condition that reference translation is final under the arrow.As shown in Figure 2, in the delegation under arrow, formed a coupling segment A by continuous three bands space and band space 2. 1., this coupling segment A is made up of a plurality of word slices that match in four reference translations disconnected (being the space of band stain among Fig. 2); And, in this coupling segment A, the longest sub-segment a ₁Be made up of continuous three band spaces 1., its correspondence among Fig. 2 the spaces of leftmost continuous three band stains in first row, so these three band spaces 1. are as a sub-segment a of coupling ₁, and the surplus next band of coupling segment A space 2. is the sub-segment a of coupling ₂Same, in the delegation under arrow, formed a coupling segment B by continuous four bands space and band space 4. 3., in this coupling segment B, the longest sub-segment b ₁Be made up of continuous four band spaces 3., its correspondence among Fig. 2 in second row spaces of several the 2nd～5 continuous four the band stains of turning left from the right side, so these four band spaces 3. are as a sub-segment b of coupling ₁, and the surplus next band of coupling segment B space 4. is the sub-segment b of coupling ₂In a word, in example shown in Figure 2, the coupling of output translation and four reference translations comprises two coupling segment A and B, also we can say to comprise four sub-segment a of coupling ₁, a ₂, b ₁, and b ₂As can be known, what adopt when calculating entropy in the present invention is the sub-segment of coupling in description subsequently.

In the present invention, be to realize for the output translation with respect to the search of the sub-segment of coupling of reference translation by a sub-segment search module of computing machine.This module realizes that with the program that is stored in the computing machine this is easy to realize to those skilled in the art.

In the step 11 of Fig. 1, the present invention uses entropy to characterize the distribution situation that these mate sub-segment.Set up departments system output translation with respect to total n the sub-segment of coupling of reference translation, and the length of each sub-segment is l _i(i=1～n).The total length L of then mating sub-segment is:

L = Σ_{i = 1}^{n} l_{i}

Entropy H is:

H = - Σ_{i = 1}^{n} \frac{l_{i}}{L} \log_{2} (\frac{l_{i}}{L})

If system's output translation and reference translation are identical, so H=0; If inequality fully, can stipulate H=+ ∞ so.System's output translation quality is high more, and its corresponding H is more little.Can judge the quality of translation basically according to entropy H.

In the present invention, mate sub-fragment length l _iThe calculating and the calculating of the total length L that all mate sub-segment be that a length computation module by computing machine realizes.And being an entropy module by computing machine, the calculating of entropy H realizes.Length computation module and entropy module realize that with the program that is stored in the computing machine this is easy to realize to those skilled in the art.

If the length of system's output translation is long or too short, perhaps the entropy that calculates so like this can not reflect the real quality of translation.Therefore, as a further improvement on the present invention, in step 12, introduced the length punitive measures:

LP = 2^{| \frac{L_{can}}{L_{\overset{&OverBar;}{ref}}} - 1 |},

Wherein, LP is the length penalty coefficient, L _CanBe the length of system's output translation, L _RefIt is the average length of reference translation.

If system's output translation mates seldom, so also may obtain lower H, therefore in step 13, punish for the very few matching ratio that also will carry out of coupling:

r = \frac{L}{L_{can}}

Wherein, r is the matching ratio penalty coefficient.

After adopting length punitive measures and matching ratio punitive measures, calculate the translation quality scoring in step 14, formula is as follows:

segScore＝a ^-H*LP×r，

Wherein, segScore is the scoring after the introducing punitive measures, and a is an empirical parameter, and its value is adjusted in 1.1～1.5 scopes, and preferred value is a=1.2.After introducing punitive measures, the score value of segScore changes between 0～1, and system's output translation quality is high more, and its corresponding segScore is big more.

In the present invention, the calculating of length penalty coefficient LP, matching ratio penalty coefficient r and translation quality scoring segScore is to realize by length punishment module, matching ratio punishment module and the evaluation and test score computing module of computing machine respectively, these modules realize that with the program that is stored in the computing machine this is easy to realize to those skilled in the art.

Claims

1, a kind of machine automatic testing method is used for the output translation of machine translation system is handled, and comprises the steps:

2) calculate entropy according to the sub-segment of coupling

H = - Σ_{i = 1}^{n} \frac{l_{i}}{L} \log_{2} (\frac{l_{i}}{L}),

Wherein

L = Σ_{i = 1}^{n} l_{i}

3) computational length penalty coefficient

LP = 2^{| \frac{L_{can}}{L_{\overset{&OverBar;}{ref}}} - 1 |},

4) calculate the matching ratio penalty coefficient

r = \frac{L}{L_{can}};

5) calculate evaluation and test score segScore=a according to entropy H, length penalty coefficient LP and matching ratio penalty coefficient r ^-H*LP* r, wherein a is the parameter of value in 1.1～1.5 scopes; Described evaluation and test score segScore is as the evaluation index of described output translation.

2, machine automatic testing method according to claim 1 is characterized in that, described parameter a value is 1.2.

3, a kind of system for automatically evaluating machine translation is used for the output translation of machine translation system and at least one piece of reference translation of setting are handled, and this system comprises a computing machine, it is characterized in that, described computing machine comprises:

L = Σ_{i = 1}^{n} l_{i}

It is the total length of the sub-fragment of coupling;

H = - Σ_{i = 1}^{n} \frac{l_{i}}{L} \log_{2} (\frac{l_{i}}{L});

LP = 2^{| \frac{L_{can}}{L_{\overset{&OverBar;}{ref}}} - 1 |},

r = \frac{L}{L_{can}};

Evaluation and test score computing module is used for calculating evaluation and test score: segScore=a according to entropy H, length penalty coefficient LP and matching ratio penalty coefficient r ^-H*LP* r, wherein a is a value in 1.1～1.5 scopes.

4, system for automatically evaluating machine translation according to claim 3 is characterized in that, the value of a is 1.2 in the described evaluation and test score computing module.