CN105183713A

CN105183713A - English composition automatic correcting method and system

Info

Publication number: CN105183713A
Application number: CN201510536642.8A
Authority: CN
Inventors: 唐聪; 宋文略; 杨晓昊; 许轶; 肖迪
Original assignee: Beijing Focusedu International Education Consultation Co Ltd
Current assignee: Beijing Focusedu International Education Consultation Co Ltd
Priority date: 2015-08-27
Filing date: 2015-08-27
Publication date: 2015-12-23

Abstract

The present invention provides an English composition automatic correcting method and system. The method comprises: extracting a plurality of text features of a to-be-corrected composition; scoring the plurality of text features by a modified preset scoring rule; and acquiring a first average score of scores of the plurality of text features, and using the first average score as a score of the to-be-corrected composition. By using a statistic and rules fusion two-stage method, the method not only alleviates the problem of a large requirement for a corpus data volume in a statistic method, but also helps to solve the problems of comprehensiveness and accuracy in the process of setting a rule; meanwhile, a weight parameter of each text feature is determined by using a score deviation correcting method so as to enable the weight parameter to be more accurate. The volume of acquired and labeled data is reduced, and time and manpower are saved. Furthermore, results of two technical schemes based on statistic and based on the rule are greater in robustness.

Description

A kind of english composition automatically correct method and system

Technical field

The present invention relates to teaching work and correct technical field, what particularly relate to a kind of english composition corrects method and system automatically.

Background technology

English composition corrects the work load that not only can alleviate teacher automatically, and the marking that student resource can also be allowed to carry out writing a composition and amendment, improve writing ability and the skill of student efficiently, accurately.

A kind of method of correcting automatically of english composition is had to be carry out statistical study by a large amount of language materials at present, by calculating the distance between theme and standard corpus storehouse, the score of in-time generatin theme and content analysis.Carry out comprehensive grading by vocabulary, sentence, the structure of an article, the relevant four large features of content, and sentence by sentence analysis is commented on to composition.But said method too relies on language material, be difficult to realization automatically correct when template language material deficiency, and when language material deficiency, the accuracy rate of correcting is very low, and a large amount of template language materials is difficult to obtain, the composition therefore adopting said method to be difficult to realize batch is corrected automatically.

Summary of the invention

For defect of the prior art, what the invention provides a kind of english composition corrects method and system automatically, and the composition achieving batch is corrected automatically.

First aspect, what the invention provides a kind of english composition corrects method automatically, comprising:

Extract multiple text features of composition to be changed;

By revised default code of points, described multiple text feature is marked;

Obtain the first average of the score of described multiple text feature, and using the score of described first average as composition to be changed.

Optionally, multiple text features of described extraction composition to be changed, comprising:

Extract the word feature of composition to be changed, sentence characteristics, paragraph structure characteristic sum topic sentence semantic feature.

Optionally, before being marked to described multiple text feature by revised default code of points, described method also comprises:

Obtain revised default code of points.

Optionally, the revised default code of points of described acquisition comprises:

By default code of points, described multiple text feature is marked, and obtain the second average of the score of described multiple text feature according to the score of described multiple text feature and default weighted value corresponding to multiple text feature;

By the corpus data preset with mark, described multiple text feature is marked, obtain the 3rd average of the score of described multiple text feature;

More described second average and described 3rd average, according to comparative result, determine whether the default weighted value revising described default code of points.

Optionally, described according to comparative result, determine whether the default weighted value revising described default code of points, comprising:

More described second average and described 3rd average, obtain the absolute value of the difference of described second average and described 3rd average;

If the absolute value of described difference is less than or equal to preset difference value, then do not need the default weighted value revising described default code of points;

Or

If the absolute value of described difference is greater than preset difference value, then revise the default weighted value of described preset rules, and again obtained the 4th average of the score of described multiple text feature by revised default weighted value, until the absolute value of the difference of described 4th average mark and described 3rd average mark is less than or equal to preset difference value.

Second aspect, what present invention also offers a kind of english composition corrects system automatically, comprising:

Extraction module, for extracting multiple text features of composition to be changed;

Grading module, for marking to described multiple text feature by revised default code of points;

First acquisition module, for obtaining the first average of the score of described multiple text feature, and using the score of described first average as composition to be changed.

Optionally, described extraction module, for:

Optionally, described system also comprises: the second acquisition module, for before being marked to described multiple text feature by revised default code of points, obtains revised default code of points.

Optionally, described second acquisition module, for:

Or

As shown from the above technical solution, what the invention provides a kind of english composition corrects method and system automatically, the method does not need too much corpus data, alleviate and large problem is required to corpus data amount, by revised default code of points, multiple feature is marked simultaneously, the appraisal result finally obtained is comparatively accurate, and the composition that can achieve batch is corrected automatically.

Accompanying drawing explanation

In order to be illustrated more clearly in the embodiment of the present invention or technical scheme of the prior art, be briefly described to the accompanying drawing used required in embodiment or description of the prior art below, apparently, accompanying drawing in the following describes is only some embodiments of the present invention, for those of ordinary skill in the art, under the prerequisite not paying creative work, other accompanying drawing can also be obtained according to these figure.

Fig. 1 is the schematic flow sheet of the method for automatically correcting of a kind of english composition that one embodiment of the invention provides;

Fig. 2 is the schematic flow sheet of the method for automatically correcting of a kind of english composition that another embodiment of the present invention provides;

Fig. 3 is the structural representation of the system of automatically correcting of a kind of english composition that one embodiment of the invention provides.

Embodiment

Below in conjunction with the accompanying drawing in the embodiment of the present invention, be clearly and completely described the technical scheme in the embodiment of the present invention, obviously, described embodiment is only the present invention's part embodiment, instead of whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art, not making the every other embodiment obtained under creative work prerequisite, belong to the scope of protection of the invention.

Fig. 1 shows the schematic flow sheet of the method for automatically correcting of a kind of english composition that one embodiment of the invention provides, and as shown in Figure 1, the method comprises the following steps:

101, multiple text features of composition to be changed are extracted;

102, by revised default code of points, described multiple text feature is marked;

103, the first average of the score of described multiple text feature is obtained, and using the score of described first average as composition to be changed.

Said method does not need too much corpus data, alleviate and large problem is required to corpus data amount, marked to multiple feature by revised default code of points, the appraisal result finally obtained is comparatively accurate, and the composition that can achieve batch is corrected automatically simultaneously.

Will be understood that, in above-mentioned steps 101, extract multiple text features of composition to be changed.The plurality of text feature specifically comprises the semantic feature of word feature, sentence characteristics, paragraph structure feature and topic sentence.Word feature comprises character length and the vocabulary grade of word, and sentence characteristics comprises character length and the clause complexity of sentence, and paragraph structure feature refers to the layout of each section of character length, and the semantic feature of topic sentence refers to the character match degree of topic sentence and composition title.

Before above-mentioned steps 102 to be marked to described multiple text feature by revised default code of points, described method also comprises unshowned step in Fig. 1:

Obtain revised default code of points.

Concrete, the english composition method of automatically correcting that the present invention proposes mainly is divided into two stages, and the first stage is the code of points (GRE, TOEFL etc.) according to english composition, extracts feature, the technical logic of design marking, as basic scoring system; Subordinate phase has the language material of mark by gathering, adjust the weight parameter of feature, thus revises the marking result of first stage, and the revised default code of points of above-mentioned acquisition is mainly divided into two stages, as shown in Figure 2:

First stage, rule-based marking, by default code of points, described multiple text feature is marked, and obtain the second average of the score of described multiple text feature according to the score of described multiple text feature and default weighted value corresponding to multiple text feature.

Namely according to code of points algorithm for design logic.Different english composition application scenarioss has different code of points (GRE, TOEFL etc.), according to each mark section in code of points to the description of feature, the logic of design bonus point or deduction, and then obtaining the score of each feature, score averages is first stage score (the second average).

Subordinate phase, Corpus--based Method is given a mark, and is marked, obtain the 3rd average of the score of described multiple text feature by the corpus data preset with mark to described multiple text feature.

Namely gather the corpus data having mark, data comprise the artificial marking mark (raw score) of composition and correspondence, and corpus data needs to cover each mark section.By the corpus data with mark, the multiple text features extracting composition to be changed are marked, obtain the 3rd average.

By above two stages, more described second average and described 3rd average, according to comparative result, determine whether the default weighted value revising described default code of points.

Concrete, more described second average and described 3rd average, obtain the absolute value of the difference of described second average and described 3rd average;

In another attainable mode, if the absolute value of described difference is greater than preset difference value, then revise the default weighted value of described preset rules, and again obtained the 4th average of the score of described multiple text feature by revised default weighted value, until the absolute value of the difference of described 4th average mark and described 3rd average mark is less than or equal to preset difference value.

Revised default code of points is obtained by said method, and then treat to correct compositions by revised default code of points and mark, result is more accurate, have employed the dual stage process that statistics Sum fanction merges, not only alleviate in statistical method and large problem is required to corpus data amount, and help the problem solving the comprehensive and accuracy faced that lays down a regulation, the determination of the weight parameter of each text feature simultaneously have employed the method utilizing mark difference to revise, and makes weight parameter more accurate.

Said method can correct compositions in batches, and decrease the amount of collection and labeled data, save time and manpower, in addition Corpus--based Method and rule-based two large technical schemes, result has more robustness.

Fig. 3 shows the structural representation of the system of automatically correcting of a kind of english composition that one embodiment of the invention provides, and as shown in Figure 3, this system comprises:

Extraction module 31, for extracting multiple text features of composition to be changed;

Grading module 32, for marking to described multiple text feature by revised default code of points;

First acquisition module 33, for obtaining the first average of the score of described multiple text feature, and using the score of described first average as composition to be changed.

One of the present embodiment preferred embodiment in, described extraction module 31, for:

One of the present embodiment preferred embodiment in, described system also comprises in Fig. 3 unshowned: the second acquisition module 34, for before being marked to described multiple text feature by revised default code of points, obtain revised default code of points.

One of the present embodiment preferred embodiment in, described second acquisition module 34, for:

Or

Said system and said method are relations one to one, and the implementation detail of said method is equally applicable to this system, and therefore the present embodiment is no longer described in detail to the concrete implementation detail of system.

Above embodiment only in order to technical scheme of the present invention to be described, is not intended to limit; Although with reference to previous embodiment to invention has been detailed description, those of ordinary skill in the art is to be understood that; It still can be modified to the technical scheme described in foregoing embodiments, or carries out equivalent replacement to wherein portion of techniques feature; And these amendments or replacement, do not make the essence of appropriate technical solution depart from the spirit and scope of various embodiments of the present invention technical scheme.

Claims

1. english composition automatically correct a method, it is characterized in that, comprising:

Extract multiple text features of composition to be changed;

By revised default code of points, described multiple text feature is marked;

2. method according to claim 1, is characterized in that, multiple text features of described extraction composition to be changed, comprising:

3. method according to claim 1, is characterized in that, before being marked to described multiple text feature by revised default code of points, described method also comprises:

Obtain revised default code of points.

4. method according to claim 3, is characterized in that, the revised default code of points of described acquisition comprises:

5. method according to claim 4, is characterized in that, described according to comparative result, determines whether the default weighted value revising described default code of points, comprising:

Or

6. english composition automatically correct a system, it is characterized in that, comprising:

7. system according to claim 6, is characterized in that, described extraction module, for:

8. system according to claim 6, is characterized in that, described system also comprises: the second acquisition module, for before being marked to described multiple text feature by revised default code of points, obtains revised default code of points.

9. system according to claim 8, is characterized in that, described second acquisition module, for:

10. system according to claim 9, is characterized in that, described second acquisition module, for:

Or