WO2017119875A1

WO2017119875A1 - Writing sample analysis

Info

Publication number: WO2017119875A1
Application number: PCT/US2016/012323
Authority: WO
Inventors: Lei Liu; Ehud Chatow; Jerry J LIU
Original assignee: Hewlett-Packard Development Company, L.P.
Priority date: 2016-01-06
Filing date: 2016-01-06
Publication date: 2017-07-13

Abstract

Examples associated with writing sample analysis are described. One example includes generating syntactic structure trees for a writing sample based on a syntax model derived from a corpus. A set of features of the writing sample is analyzed. At least one feature is analyzed based on the syntactic structure trees. The user is provided feedback derived from the set of features. The feedback includes suggestions to the user for improving the writing sample.

Description

WRITING SAMPLE ANALYSIS

BACKGROUND

[0001] One of the long standing pillars of education revolves around communicating using written words. Various techniques are used to develop students' vocabulary, logic, punctuation, and so forth as they age and progress through different levels of schooling. These techniques and skills are incorporated into a variety of national and/or local standards including, for example, Common Core, which seeks to emphasize the ability to synthesize and summarize texts, formulate arguments, and respond to source documents.

BRIEF DESCRIPTION OF THE DRAWINGS

[0002] The present application may be more fully appreciated in connection with the following detailed description taken in conjunction with the accompanying drawings, in which like reference characters refer to like parts throughout, and in which:

[0003] FIG. 1 illustrates example modules associated with writing sample analysis.

[0004] FIG. 2 illustrates a flowchart of example operations associated with writing sample analysis.

[0005] FIG. 3 illustrates another flowchart of example operations associated with writing sample analysis.

[0006] FIG. 4 illustrates an example system associated with writing sample analysis.

[0007] FIG. 5 illustrates another example system associated with writing sample analysis. [0008] FiG. 6 illustrates another flowchart of example operations associated with writing sampie anaiysis.

[0009] FiG - 7 illustrates an example computer in which example systems, and methods, and equivalents, may operate.

DETAILED DESCRIPTION

[0010] Systems and methods associated with writing sample analysis are described. In various examples analysis of writing samples may be performed based on the underlying syntax of the writing samples. This analysts may be used to generate feedback for the writing sampies. This feedback may be va!uab!e because it may provide to students or other users who submit the writing samples, information about how to improve their writing samples that cannot be explained by a mere grade or score.

[0011] In some examples, the analysis of the writing samples may be performed based on parsing a corpus into a syntax model, and on a set of features seiected by an instructor or other system administrator. The parsing of the corpus and the feature selection may allow emphasis of analysis on specific writing skills, skill levels, writing styles, and so forth. For example, different corpuses could be analyzed for different grade levels, fiction or non-fiction works, persuasive or informative writing, and so forth. Selected features may facilitate emphasizing skills such as vocabulary, grammar, logic, and so forth.

[0012] Figure 1 illustrates example modules associated with writing sample analysis. It should be appreciated that the items depicted in Figure 1 are illustrative examples and many different devices, modules, and so forth may operate in accordance with various examples.

[0013] Figure 1 illustrates an example syntax anaiysis module 110. Syntax anaiysis module 110 may analyze sentence syntax for a writing sample 100 to generate syntactic structure trees 120 for writing sampie 100. Syntactic structure trees 120 are representations of the way words of sentences are put together. Thus, syntactic structure trees 120 break down sentences into component parts to describe the sentence based on parts of speech (e.g., noun, verb, adjective) of the words in the sentence, and based on the way the words in the sentence are related to one another within the sentence (e.g., a verb phrase comprising of a verb and a noun). One example syntactic structure tree 120 is shown for a simple sentence in figure 1. Accordingly, more complex syntactic structure trees 120 will be generated for more complex sentences.

[0014] Generating the syntactic structure trees 120 for writing sample 100 may rely on a syntax model that describes grammar rules. Though a syntax mode! may be provided in advance, in other examples, the syntax mode! may be generated as a result of analyzing syntax of a corpus of works 115. For example, corpus 115 may include a set of modei works from which a syntax modei may be learned. This learning may be achieved using machine learning techniques such as a neural network. Using machine learning techniques to generate the syntax model may allow customization of the syntax model by selecting a corpus of documents that emphasizes certain writing skills or other traits. Further, buiiding a syntax model from a corpus may allow grammar rules to be Seamed instead of explicitly taught every time a new skill needed to be emphasized.

[0015] By way of illustration, a syntax mode! may be generated by generating syntactic structure trees from the corpus. These trees may then be used to train a machine learning application thai will develop a statistical model for correctness that can be used to accurately evaluate correctness of syntax of an input writing sample. Further, if sentences, clauses, words, and so forth of the corpus are tagged with labels (e.g., "simple sentence", "attributive clause", "subject clause"), these structures can be built into the syntax model as desirable for detection. This may allow an administrator to, for example, focus on these structures for analysis and/or feedback by emphasizing detection of these structures by building them into the syntax model.

[0016] Whether a syntax model is provided or generated, the syntax model may be used to generate syntactic structure trees 120 for writing sample 100. In examples where the structure tree was learned from a corpus 1 15 that emphasizes a specific writing skill or style, the syntactic structure trees 120 generated may highlight whether or not that specific skill or style was achieved in the writing sampie.

[0017] These syntactic structure trees 120 may in turn facilitate analyzing various features of writing sample 100. For example, structure trees of writing sample 100 may be compared against model syntactic structure trees generated from the corpus of works i 15, These features may facilitate providing feedback to a user 140 (e.g., a student) who created writing sampie 100.

[0018] This feedback may be generated by feedback analysis module 130 which may use the syntactic structure trees to provide specific feedback for improving writing sampie 100. This may be more useful to user 140 than a singie score or grade that does not explain how writing sample 100 may be improved.

[0019] By way of illustration, after mapping a writing sampie 100 into syntacticstructure trees 120, the syntactic structure trees 120 may be compared to syntactic structure trees 120 generated for works from corpus 115. This may aliow identification of errors in the writing samples by showing where a sentence from writing sample 100 has a syntactic structure tree 120 that does not match a syntactic structure tree 120 derived from a sample In corpus 115. In these cases, recommendations for fixing writing sampie 100 may be identified for later provision to user 140.

[0020] In another example, the syntactic structure trees 120 may be able to identify when certain word choices could be improved to enhance writing sample 100. By way of illustration if administrator 150 has emphasized a set of vocabulary words, feedback analysis module 130 may use the syntactic structure trees 120 to identify where these vocabulary words could have been used in the writing sampie 100, and suggest corresponding changes in feedback provided to user 140,

[0021] In addition to the syntactic structure trees 120, feedback analysis module 130 may rely on a set of metrics 135 that emphasizes, for example, specifie writing skills. This may cause feedback analysis module 130 to specifically provide feedback on those metrics 135. In some examples, the metrics 135 may be provided by an administrator 150. These metrics may relate to, for example, sentence structure, vocabulary, punctuation, lexical class usage, sample length, grammar, organization, and so forth. For example, feedback related vocabulary words may indicate places in writing sample 100 where those words couid have been used. Feedback related to organization may indicate areas that could. use emphasis or explanation.

[0022] To illustrate the above process, consider a scenario where a teacher {e.g., administrator 150) wants their students (e.g., users 140) to focus on correctly writing subject clauses- The syntactic structure trees for writing samples could be analyzed to identify where subject clauses are in sentences, and then correciness of the subject ciauses in writing samples 100 received from the studenis couid be analyzed. In addition to letting the students know how well they performed at writing subject ciauses as an abstract value (e,g„ a percentage of correctly written subject clauses), specific feedback correcting the incorrectly written subject clauses could be provided to the students,

[0023] In some examples, identifying correctness of syntactic structures in a writing sample may be achieved by identifying an average difference between a ground truth syntactic structure tree generated from a training corpus and the syntactic structure 120 generated from a writing sample 100 . To measure a difference between two structure trees, an adjacency matrix for each tree may be generated according to equation 1 :

[0024] In equation 1, A_ij is the proximity between two nodes c_i and c_j And num _hops(c_i→ c_j} is the number of hops to traverse from node c_i to c_j. To measure the diff erence between adjacency matrices A and A for two tree structures, mean square error may be computed between the two matrices according to equation 2;

[0025] A smaller mean squared error between two adjacency matrices indicates that their respective syntactic structure trees are similar. Consequently, if a syntactic structure tree 120 of a writing sample 100 is similar to the syntactic structure tree of a corpus, it is likely that the grammar of the writing sample is correct. Values can be adjusted to account for margins of error when providing feedback, and suggestions can be made based on the syntactic structure tree of the corpus to indicate a proper sentence structure to a user.

[0026] As discussed above, in examples where user 140 is a student, administrator 150 may be their writing instructor. Consequently, feedback analysis module 130 may also provide feedback generated from writing sample 100 to administrator 50. This may allow the administrator 150 to identify areas where user 140 can improve their writing abilities and provide further instruction. Additionally, the feedback may allow administrator 150 to adjust the metrics 135 used by feedback analysis module 130. This may be desirable as user 140 learns new material, masters writing skills, and so forth, so that future writing samples 100 may be analyzed using a different set of metrics.

[0027] Feedback analysis module 130 may include additional features that may aid user 140. For example, over time, user 140 may submit several writing samples 100 for feedback. If user 140 continues to make errors involving a specific writing skill, these repeated errors could be emphasized to user 140 so user 140 is more likely to learn to correct these errors going forward (e.g., by identifying a resource that user 100 can access to learn about the error). Thus, feedback analysis may provide certain pattern recognition tasks for writing samples 100 over time.

[0028] Additionally, feedback analysis module may allow administrator 150 to access data generated by analyzing writing samples 100 to facilitate improved instruction of user 140 and/or an entire group of users 140 (e.g., a classroom of students). This data may identify to administrator 150, writing skills where user 140 has continually struggled, writing skills where a disproportionate number of the group of users 140 are struggling, and so forth.

[0029] It is appreciated that, in the following description, numerous specific details are set forth to provide a thorough understanding of the examples. However, it is appreciated that the examples may be practiced without limitation to these specific details. In other instances, methods and structures may not be described in detail to avoid unnecessarily obscuring the description of the examples. Also, the examples may be used in combination with each other. [0030] "Module", as used herein, includes but is not limited to hardware, firmware, software stored on a computer-readable medium or in execution on a machine, and/or combinations of each to perform a function(s) or an action(s), and/or to cause a function or action from another module, method, and/or system. A module may include a software controlled microprocessor, a discrete module, an analog circuit, a digital circuit, a programmed module device, a memory device containing instructions, and so on. Modules may include gates, combinations of gates, or other circuit components. Where multiple logical modules are described, it may be possible to incorporate the multiple iogicai modules into one physical module. Similarly, where a single logical module is described, it may be possible to distribute that single logical module between multiple physical modules.

[0031] Figure 2 illustrates an example method 200 associated with writing sample analysis. Method 200 may be embodied on a non-transitory computer- readable medium storing processor-executable instructions. The instructions, when executed by a processor, may cause the processor to perform method 200. In other examples, method 200 may exist within logic gates and/or RAM of an application specific integrated circuit.

[0032] Method 200 includes generating syntactic structure trees for a writing sample at 230. As used herein, syntactic structure trees describe a hierarchy of parts of speech of sentences. Syntactic structure trees further identify relationships between words and clauses of the sentences. The syntactic structure trees may be generated based on a syntax model derived from a corpus. The syntax model may describe a set of grammar rules for identifying parts of speech of words, as well as how to build a syntax structure tree of words in the sentence. In some examples, the syntax model may be a set of syntactic structure trees generated for the corpus. The corpus may be a set of texts, documents,, or works. Specialized corpuses that are tailored to building a syntax model for analyzing a specific writing skill, skill level, writing style, and so forth may be used. In other examples, the corpus may be a set of model documents that have been vetted by experts to be grammatically correct or serve some other purpose (e.g., relate to a specific writing style or subject matter). The writing sample may be received from a user (e.g., a student). The writing sample may be a text document that the user themselves created. Though a template or outline may have been used in creation of the writing sample, the writing sample may be primarily the work product of the user.

[0033] Method 200 also includes analyzing a set of features for the writing sample at 240. At least one member of the set of features may be selected by an administrator (e.g., a teacher). At least one of the features may be analyzed based on the syntactic structure trees. The features may relate to, for example, sentence structure, vocabulary, punctuation, lexical class usage, writing sample length, grammar, organization, and so forth.

[0034] By way of illustration, if the administrator seeks to evaluate whether a student has learned to use certain words in certain parts of sentences (e.g., specific vocabulary words in subordinate clauses), at action 240, these parts of sentences in the writing sample, as identified by the syntactic structure trees, may be examined to see whether these specified words appeared. In another example, analysis of syntactic structure trees may be performed to identify whether a user has properly used adverbs within the writing sample.

[0035] Method 200 also includes providing feedback to the user at 250. The feedback may be derived from the set of features analyzed at action 240. As used herein, feedback is intended to convey information to a user about that goes beyond a mere score or grade of the writing sample. By way of illustration, in some examples, the feedback derived from the set of features may include suggestions to the user for improving the writing sample. In some examples, these improvements may be specifically tailored to the at least one member of the set of features selected by the administrator. In other examples, the feedback derived from the set of features may identify a writing skill resource for the user based on a weakness identified in the writing sample. Feedback may also convey what a user did well, resources that the user may access to work on a skill, and so forth.

[0036] Figure 3 illustrates a method 300 associated with writing sample analysis. Method 300 includes several actions similar to those described above with reference to method 200 (figure 2). For example, method 300 includes generating syntactic structure trees from a writing sample based on a syntax model at 330, analyzing feaiures of the writing sample at 340, and providing feedback to a user based on the writing sample at 350.

[0037] Method 300 also includes generating the syntax model from the corpus at 310. In this example, a member of the set of features analyzed at action 340 may be selected based on the generation of the syntax model. This may be achieved when the corpus has been tagged to emphasize certain features of writing of the corpus.

[0038] Method 300 also includes identifying parts of speech for words in the writing sample at 320. Generally, identifying parts of speech may include assigning tags to each word thai describe what part of speech (e.g., noun, verb, adjective) each word corresponds to. In some examples it may be helpful to perform other preprocessing steps to facilitate tagging parts of speech and/or other actions of method 300, For example, stemming words to their base forms (e.g., cats to cat, fishing or fisher to fish) may speed up certain actions by reducing the size of dictionaries that must be searched for the actions.

[0039] Figure 4 illustrates a system 400 associated with writing sample analysis. System 400 includes a data store 410, Data store 410 may store syntactic structure trees describing valid sentences structures. The set of syntactic structure trees may be generated from a corpus.

[0040] System 400 also includes a feedback module 420. Feedback module 420 may provide feedback to a user regarding a writing sample 499 provided by the user. The feedback may be provided for a set of metrics. At least one metric may be analyzed based on the syntactic structure trees in data store 410. The feedback may include suggestions for improving writing sample 499.

[0041] System 400 also includes a metric selection module 430. Metric selection module 430 may allow an administrator to adjust the set of metrics. This may allow the administrator to adjust the set of metrics to emphasize feedback for a specified writing skill. [0042] System 500 illustrates a system 500 associated with writing sample analysis. System 500 includes several items similar to those described above with reference to system 400 (figure 4), For examples, system 500 includes a data store 510, a feedback module 520 to provide feedback to a user regarding a writing sample 599, and a metric selection module 530.

[0043] System 500 also includes a feedback data store 540. Feedback data store 540 may store feedback provided to the user by feedback module 520. In some examples, feedback data store 540 may also store feedback provided to a set of users including the user. Consequently, system 500 includes a group analysis module 560. Group analysis module may generate a group report for an administrator. The group report may organize results from feedback provided to members of the set of users to show writing habits of members of the set of users.

[0044] System 500 also includes a feedback comparison module 550. Feedback comparison module 550 may provide a historical report to at least one of the user and an administrator. The historical report may organize results from the feedback provided to the user by feedback module 520, and past feedback provided to the user. Thus, the historical report may show writing habits of the user.

[0045] System 500 also includes a logic analysis module 570. Logic analysis module may analyze a logical structure of writing sample 599, The logical structure of the writing sample may be analyze based at least on the syntactic structure trees from data store 510. Thus, feedback module 520 may also provide feedback regarding the logical structure of the writing sample.

[0046] System 500 also includes a syntax analysis module. Syntax analysis module may generate the syntactic structure trees stored in data store 510 from a corpus of works. This may facilitate changing the syntactic structure trees over time.

[0047] Figure 6 illustrates a method 600 associated writing sample analysis. Method 600 includes receiving a signal indicating a set of metrics at 610. The signal may be received from an administrator. The set of metrics may be used for analyzing writing samples that will be received from members of a set of users. At least one metric may rely on a syntactic structure tree generated from a corpus.

[0048] Method 800 also includes receiving writing samples at 820. The writing samples may be received from the members of the set of users. Method 600 also includes providing feedback to the members of the set of users at 630, The feedback may be provided regarding users' respective writing samples as the writing samples are received. The feedback may be generated based on the members of the set of metrics. Further, the feedback may indicate suggestions for improving the writing samples.

[0049] In some examples, method 600 may include additional actions (not shown). For example, method 600 may include providing the feedback to the administrator. An updated set of metrics may subsequently be received from the administrator, and above actions may be repeated for the updated set of metrics and a new set of writing samples received from the users. Additionally, in some examples, method 600 may also inciude generating the syntactic structure trees from the corpus.

[0050] Figure 7 illustrates an example computer in which example systems and methods, and equivalents, may operate. The example computer may include components such as a processor 710 and a memory 720 connected by a bus 730. Computer 700 also includes a writing sample analysis module 740. Writing sample analysis module 740 may perform, alone or in combination, various functions described above with reference to the example systems, methods, apparatuses, and so forth. In different examples, Writing sample analysis module 740 may be implemented as a non-transitory computer-readable medium storing processor- executable instructions, in hardware, software, firmware, an application specific integrated circuit, and/or combinations thereof.

[0051] The instructions may also be presented to computer 700 as data 750 and/or process 780 that are temporarily stored in memory 720 and then executed by processor 710. The processor 710 may be a variety of processors including dual microprocessor and other multi-processor architectures. Memory 720 may include non-volatile memory (e.g., read only memory) and/or volatile memory (e.g., random access memory). Memory 720 may also be, for example, a magnetic disk drive, a solid state disk drive, a floppy disk drive, a tape drive, a flash memory card, an optical disk, and so on. Thus, memory 720 may store process 760 and/or data 750. Computer 700 may also be associated with other devices including computers, printers, peripherals, and so forth in numerous configurations (not shown).

[0052] It is appreciated that the previous description of the disclosed examples is provided to enable any person skilled in the art to make or use the present disclosure. Various modifications to these examples will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other examples without departing from the spirit or scope of the disclosure. Thus, the present disclosure is not intended to be limited to the examples shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

WHAT IS CLAIMED IS:

1. A method, comprising:

generating syntactic structure trees for a writing sample based on a syntax model derived from a corpus;

analyzing a set of features of the writing sample, where at least one of the features is analyzed based on the syntactic structure trees; and

providing, to a user, feedback derived from the set of features where the feedback includes suggestions to the user for improving the writing sample.

2. The method of claim 1 , comprising generating the syntax model from the corpus.

3. The method of claim 2, where at least one member of the set of features is selected as a result of generating the syntax model from the corpus.

4. The method of claim 1 , where at least one member of the set of features relates to at least one of, sentence structure, vocabulary, punctuation, lexical class usage, sample length, grammar, and organization.

5. The method of claim 1 , where at least one member of the set of features is selected by an administrator.

6. The method of claim 1 , where the feedback derived from the set of features identifies a writing skill resource for the user based on a weakness identified in the writing sample.

7. The method of claim 1 , comprising identifying parts of speech for words in the writing sample.

8. A system, comprising:

a data store to store syntactic structure trees describing valid sentence structures, where the set of syntactic structure trees is generated from a corpus; a feedback module to provide feedback to a user regarding a writing sample, where the feedback is provided for a set of metrics, where at least one metric is analyzed based on the syntactic structure trees, and where the feedback includes suggestions for improving the writing sample; and

a metric selection module to allow an administrator to adjust the set of metrics to emphasize feedback for a specified writing skill.

9. The system of claim 8, comprising:

a feedback data store to store the feedback provided to the user; and a feedback comparison module to provide a historical report to at least one of, the user and the administrator, where the historical report organizes results from the feedback provided to the user and past feedback provided to the user to show writing habits of the user.

10. The system of claim 9 where the feedback data Store Stores feedback provided to a set of users including the user, and where the system comprises a group analysis module to generate a group report for the administrator, where the group report organizes results from feedback provided to members of the set of users to show writing habits of members of the set of users.

11. The system of claim 9, comprising a logic analysis module to analyze a logical structure of the writing sample based at least on the syntactic structure trees, and where the feedback module provides feedback regarding the logical structure of the writing sample.

12. The system of claim 8, comprising a syntax analysis module to generate the syntactic structure trees from the corpus.

13. A non-transitory computer-readable medium storing computer- executable instructions that when executed by a computer cause the computer to: receive, from an administrator, a signal indicating a set of metrics for analyzing writing samples generated by members of a set of users, where at least one metric relies on a syntactic structure tree generated from a corpus;

receive writing samples;

provide feedback to the members of the set of users regarding respective writing samples as the writing samples are received, where the feedback is generated based on the members of the set of metrics and where the feedback indicates suggestions for improving the writing samples.

14. The non-transitory computer- readable medium of claim 13, where the instructions further cause the computer to:

provide the feedback to the administrator;

receive, an updated set of metrics from the administrator; and

repeat actions for the updated set of metrics and a new set of writing samples.

15. The non-transitory computer- readable medium of claim 13, where the instructions further cause the computer to generate the syntactic structure trees from the corpus.