WO1999008205A1 - A method and apparatus for authoring of customizable multimedia documents - Google Patents

A method and apparatus for authoring of customizable multimedia documents Download PDF

Info

Publication number
WO1999008205A1
WO1999008205A1 PCT/CA1998/000771 CA9800771W WO9908205A1 WO 1999008205 A1 WO1999008205 A1 WO 1999008205A1 CA 9800771 W CA9800771 W CA 9800771W WO 9908205 A1 WO9908205 A1 WO 9908205A1
Authority
WO
WIPO (PCT)
Prior art keywords
document
ofthe
class
annotations
html
Prior art date
Application number
PCT/CA1998/000771
Other languages
French (fr)
Inventor
Chrysanne Dimarco
Mary Ellen Foster
Original Assignee
Chrysanne Dimarco
Mary Ellen Foster
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from GBGB9716986.6A external-priority patent/GB9716986D0/en
Priority claimed from GBGB9720133.9A external-priority patent/GB9720133D0/en
Priority claimed from CA002230367A external-priority patent/CA2230367C/en
Application filed by Chrysanne Dimarco, Mary Ellen Foster filed Critical Chrysanne Dimarco
Priority to AU87959/98A priority Critical patent/AU8795998A/en
Priority to EP98939453A priority patent/EP1002284A1/en
Publication of WO1999008205A1 publication Critical patent/WO1999008205A1/en
Priority to US09/502,233 priority patent/US6938203B1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/221Parsing markup language streams
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/55Rule-based translation
    • G06F40/56Natural language generation

Definitions

  • This invention relates to a method and apparatus for the authoring of customizable multimedia documents and the adaptive generation of versions thereof for particular uses.
  • Natural Language Generation is a young but growing research field, whose goal is to build computer systems that automatically produce fluent and effective texts in various human languages.
  • NLG systems have used knowledge databases containing general world knowledge and specific domain knowledge, together with various linguistic resources (e.g., lexicons, grammars, discourse relations), to produce texts with limited variation in word choice, sentence and discourse structure, and virtually no variation in rhetorical style or pragmatic purpose.
  • a master document may refer to the complete superset of instructions to direct the actions of a robot on an assembly line.
  • this process of adaptive document generation should be easily implementable on a computer system at minimum possible cost and maximum possible ease of use to both the author ofthe master document and the user ofthe generation system.
  • IDAS The IDAS project (Reiter, Mellish, and Levine 1995) recognized the need to tailor both textual and non-textual information, including visual formatting, hypertext input, and graphics output. IDAS also tried to address the need for explicit authoring tools in the adaptive document generation process, but here the focus was on authoring at the knowledge-base level (i.e., at the level of a computer system's internal representation), while there still exists a need to provide an authoring tool that may be used by a non-computer-programmer professional writer who could compose the master document at the level of ordinary English, with additional markup as required (e.g., HTML markup to support an HTML presentation format for a resulting customized version ofthe document).
  • knowledge-base level i.e., at the level of a computer system's internal representation
  • additional markup e.g., HTML markup to support an HTML presentation format for a resulting customized version ofthe document.
  • IDAS relies mainly on canned texts and aims to provide the user with a means of navigating through the whole "hyperspace" of possible (canned) texts. There is however a need to provide for a much finer-grained degree of tailoring than the IDAS implementation. While IDAS relies mainly on canned texts, other adaptive generation systems do use more-dynamic text generation: the Migraine system (Carenini, Mittal, and Moore 1994) uses an approach to text planning that adaptively selects and structures the information to be given to a particular reader. However, Migraine relies on a large number of context-sensitive and user-sensitive "text plans" (i.e., text schemas) so that its methods of tailoring must of necessity be very specific to its particular domain.
  • context-sensitive and user-sensitive "text plans” i.e., text schemas
  • the PEBA-II system uses more-general text plans, as well as text templates, that it can choose from to adapt information to the individual reader, but the tailoring done is very specific, focussing on the user's familiarity with a topic.
  • the PIGLET system (Cawsey, Binsted, and Jones 1995) also uses a combination of text plans and text templates, but its tailoring is also quite specific in nature, mainly concerned with emphasizing material that is relevant to a particular patient.
  • the ILEX-0 system (Knott, Mellish, Oberlander, and O'Donnell 1996) is similar to the PIGLET model in its anticipation of all the possible texts that might be generated, but also includes annotations (e.g., a condition on a piece of canned text) to allow some local customization.
  • annotations e.g., a condition on a piece of canned text
  • very free and flexible use of annotations could lead to problems of repetitive text and inappropriate use of referring expressions in the resulting document, requiring textual repair.
  • the system should be able to support either an adaptive generation system with full facilities for selecting and repairing texts, as described by DiMarco, Hirst, Wilkinson, and Wanner (1995) and Hirst, DiMarco, Hovy, and Parsons (1997), or a simpler version ofthe system, based on "generation by selection only", i.e., with no facilities for textual repair, an implementation of which (called “WebbeDoc”) is described by DiMarco and Foster (1997).
  • an author of a customizable document needs to be able to describe the variations of a document, which may be both textual and non-textual, at various levels ofthe document structure, together with the conditions for selecting each variation.
  • the author then needs a means of selecting all the appropriate variations for a particular purpose or audience, re-assembling the selected variations into a coherent document, and producing an appropriately customized version ofthe document, in potentially many different levels of representation (e.g., surface English, a deep syntactic or semantic representation for use in textual repair) and presentation formats (e.g., HTML, LaTeX).
  • levels of representation e.g., surface English, a deep syntactic or semantic representation for use in textual repair
  • presentation formats e.g., HTML, LaTeX
  • This invention seeks to provide a computer system for customizing an initial master document containing information for a multiplicity of versions ofthe document intended for the different purposes or different users, for a specific purpose or for a specific user.
  • a computer system for customizing an initial master document in accordance with a user-defined set of purpose parameters.
  • a data structure i.e., the customizable, "master”, document
  • a master document will therefore contain all the information that the system might need to include in any particular customized version ofthe document, together with annotations giving the selection conditions as to when each piece of information is relevant and other annotations giving linguistic and formatting information, including for multimedia elements ofthe document.
  • a further aspect ofthe invention provides for a method and apparatus for reading said data structure into a form implementable on a suitably programmed processor such that the implementation ofthe data structure can store both the form and content of a master document, i.e., all the elements ofthe document and their variations, and can also act as the process for selecting the relevant variations ofthe document, according to given values of input parameters specifying the intended purpose or intended user, and then generating the appropriately customized version of the document.
  • Figure 1 is a schematic diagram showing the architecture ofthe system
  • Figure 2 shows a generalised form of a data structure for specifying a customizable, master, document according to an embodiment ofthe present invention
  • Figure 3(a) shows a generalised form of a data substructure for use within the main data structure for specifying linguistic or presentation format information for a component of the master document
  • Figure 3(b) shows a generalised form of a data substructure for use within the main data structure for specifying hypertext links to parts ofthe main data structure or to other data structures ofthe form as specified in Figure 2 and provided in other source datafiles;
  • Figure 4 is a flowchart ofthe overall process of reading-in as input the data structure and generating as output a final customized version ofthe document;
  • Figure 5 is a graph showing the resolution process for a customized document generated according to an embodiment ofthe present invention.
  • the system comprises: a datafile 12 including a data structure 13 for defining a master document; a parser 14 for reading in the contents ofthe datafile 12 and for creating instances ofthe document-class data structures 18 in accordance with the general definitions of document-class data structures 16; a user input interface 20 for reading new values of purpose parameters 22 which are input by a selection engine 18.
  • the selection engine 18 uses the current values ofthe purpose parameters to select the relevant variations of each component ofthe document and to generate appropriately customized versions 26 of a document, which may also include hypertext links to new documents 28, which may themselves be customizable documents ofthe form illustrated in Figure 2.
  • purpose parameter means a parameter used in evaluating a selection condition associated with a particular variation of a document structure, where this parameter can be used in defining either a particular intended purpose or use of a customized version ofthe document or a particular intended user or audience for a customized version.
  • FIG 2 an embodiment ofthe data structure 13 according to the present invention is described, which shows the blocks, sub-blocks, fields, and subfields ofthe data structure.
  • the data structure 13 has the following main blocks of information: 1. Identification of purpose parameters and representation-level parameters, and their possible values.
  • Block 1 The purpose parameters.
  • the first block ofthe data structure identifies the purpose parameters, or user parameters, together with their possible values, that can be used in forming the Boolean expressions that give the conditions for selecting each variation of a document.
  • the first block also identifies the representation-level parameters, together with their possible values, that can be used in forming the Boolean expressions that give the conditions for selecting each desired level of representation ofthe sentences in the master document during the process of generating a customized version ofthe document.
  • Block 2 The toplevel object.
  • the toplevel object identifies the document-class instance which is the "root" element ofthe entire document. It is with this object that the resolution process for generating a customized version ofthe document begins.
  • Blocks 3-9 The program structures. Blocks 3-9 describe the program structures, the classes that implement the substructures ofthe data structure that specify the form and content of a customizable document. The program structures are related in the following manner:
  • a datafile describing a set of program structures is a particular example of a customizable document created for various uses.
  • the datafile may be divided into various parts.
  • the data structure may be divided into components, or elements, referred to as the classes Document, Section, Topic, Sentence, and Lexical, which each implement a substructure ofthe data structure that defines a component of a customizable document.
  • Each such substructure ofthe data structure also includes the variations of a component and the conditions for selecting the appropriate variation of a component.
  • the data structure contains basic components which are instances ofthe classes Word and Annotation, and other components, which are instances ofthe class External, for linking to other datafiles.
  • Block 3 The Documents.
  • Block 3 describes the instances ofthe class Document.
  • Each Document description must specify the following properties: - A list of its variations.
  • Each variation must be an instance of the class
  • Annotation An instance of the class Annotation can specify both textual and non-textual properties of a document or a component of a document in terms of a particular document-layout format, structure, or linguistic representation.
  • an Annotation object could specify multimedia elements ofthe document's layout, such as alignment of text, font size, background colour, text colour, and graphics; other Annotation objects could specify linguistic information such as discourse relations or coreference links.
  • Each variation of a Document class is then described as an instance ofthe class DocumentVariation.
  • Each DocumentVariation description specifies the following properties:
  • the condition for selecting this variation must be a Boolean expression composed from pairs of purpose parameters and their allowable values.
  • Block 4 The Sections. Block 4 describes the instances of the class Section. Each Section description specifies the following properties:
  • Each annotation must be an instance of the class Annotation.
  • Each variation of a Section class is then described as an instance ofthe class SectionVariation.
  • Each SectionVariation description specifies the following properties:
  • the condition for selecting this variation must be a Boolean expression composed from pairs of purpose parameters and their allowable values.
  • Block 5 The Topics. Block 5 describes the instances ofthe class Topic. Each Topic description specifies the following properties:
  • Topic-Variation A list of its annotations. Each annotation must be an instance ofthe class
  • TopicVariation Each variation of a Topic class is then described as an instance ofthe class TopicVariation.
  • Each TopicVariation description specifies the following properties: - The condition for selecting this variation. The condition must be a
  • Boolean expression composed from pairs of purpose parameters and their allowable values.
  • Block 6 The Sentences. Block 6 describes the instances ofthe class Sentence. Each Sentence description must specify the following properties: - A list of its variations. Each variation must be an instance ofthe class
  • the condition for selecting this variation must be a Boolean expression composed from pairs of purpose parameters and their allowable values. - A list ofthe components of this variation.
  • Each annotation must be an instance ofthe class Annotation.
  • Each SentenceRepLevel description must specify the following properties:
  • the condition for selecting this sentence representation must be a list of one or more representation-level parameters.
  • an instance ofthe class SentenceRepLevel may be a character string, with any Lexical components identified by surrounding reserved characters. This is a simplification made for ease of testing system prototypes, and does not limit the scope ofthe invention.
  • Block 7 The Lexicals. Block 7 describes the instances of the class Lexical. Each Lexical description must specify the following property:
  • LexicalVariation Each variation of a Lexical class is then described as an instance ofthe class LexicalVariation.
  • Each LexicalVariation description must specify the following properties:
  • the condition for selecting this variation must be a Boolean expression composed from pairs of purpose parameters and their allowable values.
  • a component is a Sentence, Topic, or Section, then it is treated as a set of variations of a separate piece of a document, which will be resolved to select the appropriate version.
  • a component is an External object, then it is treated as a whole complete customizable document. In this way, hypertext links to other customizable documents can be handled within the system.
  • Block 8 The Words. Block 8 describes the instances ofthe class Word. Each Word description must specify its associated string and its associated annotations. Each annotation must be an instance ofthe class Annotation.
  • Block 9 The Annotations.
  • Block 9 describes the instances ofthe class Annotation that will be used to insert all the relevant linguistic and formatting information into the customized version ofthe document to be output. A description of block 9 is not included in figure 2. Instead, the general structure of an Annotation class object is shown in figure 3(a).
  • the Annotation objects can be grouped into several distinct sub-taxonomies, one for each type of linguistic or formatting annotations that will be attached to the main master-document data structure.
  • one Annotation sub-taxonomy might specify details of HTML layout for the overall document and each component ofthe document; another Annotation sub-taxonomy might record properties ofthe discourse structure ofthe overall document and each component ofthe document, such as rhetorical relations and coreference links.
  • Each Annotation object has a property "parent" to reference its immediate ancestor in its (sub-)taxonomy.
  • Block 10 The Externals.
  • Block 10 describes the instances ofthe class External , which will be used to create hypertext links to other customizable documents specified in other datafiles. Each External description must specify the following attributes: The name ofthe file containing the external customizable document.
  • a user profile The user profile is a list of parameters that describe the user or audience for whom the document customization is being performed. The general structure of an External class object is shown in figure 3(b).
  • the data structure 13 allows an author to describe the structure of a customizable document (i.e., a master document).
  • the data structure has a recursive and object-oriented form and can be implemented using an object-oriented programming language so that a customizable document described in the form ofthe data structure can be implemented as an object-oriented computer program.
  • the elements ofthe data structure are related by both part relationships and by inheritance relationships, so that the relationships between the elements of a customizable document described in the form ofthe data structure and implemented as a corresponding object-oriented computer program can be recognized and maintained by the object hierarchy and inheritance mechanism of an object-oriented computer program.
  • an object-oriented computer program that implements the data structure is both the form and content of a customizable document and the process for selecting and generating an appropriately customized version ofthe document.
  • the data structure is generic in the sense that it can implement any customizable document given in the form ofthe data structure.
  • the data structure 13 describes the corpus text of a specific customizable document (i.e., a master document) having elements and structure, as shown in figures 2(a) , 2(b) and 2(c) following.
  • the form ofthe data structure is defined by the set of general class objects 16 describing the elements and structure of a customizable document in terms of object- oriented program structures. These class objects are related by both part relationships and by inheritance relationships to be explained below.
  • the operation ofthe system 10 (the “tailoring engine") may be explained as follows:
  • the parser program 14 reads in and parses the master-document datafile 12 to recognize its structure and then maps the contents ofthe input datafile into class objects of name according to class names specified in the input data structure. Properties ofthe classes are also recognized according to information contained in the data structure.
  • the parser program 14 also acts as a document-class instantiator program which uses the parsed contents ofthe input datafile to dynamically create instances of the program structures identified by the general class objects described above. Properties ofthe classes are also dynamically assigned according to information contained in the data structure.
  • An integration of data structures with the main process ofthe system comprises the following:
  • a feature ofthe system is that given the current values ofthe purpose parameters, the instances ofthe program data structures, that is to say, the instances of the class objects describing the elements ofthe master document, execute themselves to select and generate the appropriate customized version ofthe document.
  • the core ofthe system that is, the integration of program data structures with the selection process, is generic in terms ofthe following properties:
  • the system core is independent ofthe application: the only items that need be re-defined for a new application are the input datafile and the interface for reading the current values ofthe purpose parameters. This is discussed further below.
  • the system core is currently implemented in the Java programming language, but is independent ofthe underlying programming language, i.e., processor, to the extent that the programming language used must provide an object-oriented paradigm and a semantics for property inheritance that is consistent with the specification ofthe resolution process used in the system for generating a customized version of a document from the instances ofthe program data structures.
  • the process of generating a customized version of a document from the instances ofthe program structures is referred to as resolution and will be discussed later with reference to Figure 5.
  • a customized version of a document can be generated in any number of different levels of representation of its content (e.g., surface English; a syntactic or semantic representation to be used by a text-repair facility; and so on).
  • the different representations to be generated for any given application must be indicated by the representation-level parameters in the description ofthe document given by the data structure in the input datafile. This information is specified in block 1 ofthe data structure as described with reference to figures 2(a), 2(b), and 2(c) above.
  • Each different representation ofthe content of a customized version ofthe document will be generated along with a list of all the relevant annotations to the content. These annotations provide information on the multiple forms in which the document may subsequently be presented to a reader, and the linguistic information that may be used to guide subsequent repair ofthe customized document.
  • the customized document that is generated will also include all annotations to the content concerning the External objects that can be used to provide hypertext links to other customizable documents or to other applications ofthe system 10 (the "tailoring engine") from figure 1.
  • the process of generating a customized version of a master document is shown in Figure 4. There are two main stages to this process, the initial setup and the main program loop. In the initial setup, the input datafile is read in and parsed, and the appropriate instances ofthe various document classes described above are created.
  • the parser program 14 reads in the input datafile 12, which contains the data structure 13 giving the specification of a customizable document.
  • the driver program also acts as a class instantiator to create instances of all relevant document classes according to the input data structure.
  • Program links, i.e., references, are created between these class instances via the setting of their properties and assignment of their property values.
  • new values ofthe purpose parameters are read in, and a customized version ofthe document is generated as output for each specified level of representation.
  • a user interface in the form of a reader program 20 obtains the new values ofthe purpose parameters. In the latter instance, the parameter values may be entered interactively or may be read in from previously compiled profiles of user preferences stored in computer databases.
  • a selection engine 18 resolves the document instances created in the setup stage according to the current values ofthe purpose parameters to generate the appropriately customized version ofthe document.
  • the customized document is output in all specified levels of representation with all relevant linguistic and formatting information attached to each component of the document If there are no more new purpose-parameter values to read in, the main program loop terminates.
  • FIG. 5 is a graph showing the resolution process for a customized document generated according to an embodiment ofthe present invention.
  • the pseudocode for the Resolve procedures, which implement the resolution process, is as follows:
  • An instance of DocumentObjectSet i.e., a set of references to resolved DocumentObjects, with one set member for each desired level of representation of this Variation instance
  • This property is a list of all the Annotation objects that apply to this document-class instance and must contain one Annotation object for each ofthe distinct sub- taxonomies in the overall class of Annotation objects.
  • a user interface in the form of a reader program to obtain new values of purpose parameters is provided.
  • This reader program returns as output an instance of WorkingCondition which is a set of purpose parameters and their values (implemented as a hashtable in one embodiment ofthe invention).
  • the main program first reads in the datafile, then calls the reader program to obtain a new instance of WorkingCondition. The main program will then start the resolution process for the toplevel Document object by passing it the WorkingCondition.
  • the output of each iteration ofthe main program will be a customized version ofthe document in all the levels of representation specified in the input datafile using the representation-level parameters. Each representation ofthe customized document is output for possible later processing by the application system.
  • the first example is for the customizable home page ofthe HealthDoc Project at the University of Waterloo (Waterloo, Canada).
  • the second example is for a master document giving basic health information on diabetes.
  • top-level object toplevel Document.webbedoc // The Documents and DocumentVariations Document webbedoc
  • Sentence sent-compliance-2
  • SentenceRepLevel sent-compliance-3 a-english
  • Sentence sent-compliance-4
  • Lexical lexSynonymsl I variations lexSynonyms 1 a lexSynonyms 1 b& annotations ⁇
  • condition-()& componentList-Topic.topic 1 & annotations html-sec-diabetes-subsec 1 - 1 a discourse-sec-diabetes-sec 11
  • repLevel english& componentList ⁇ Lexical.lexDiab Lexical.lexis Lexical. lexa Lexical.lexgroup Lexical. lexof Lexical. lexconds Lexical.lexin Lexical.lexwhich Lexical. lexglucose Lexical. lexlevels Lexical.lexare Lexical.
  • Sentence sent3a-l
  • condition ()
  • condition ()
  • the program objects are:
  • BasicObject classes e.g., BasicDocument, BasicSection, etc.
  • VariationContainer classes e.g., Document, Section, Topic
  • VariationContainer classes are extensions of ResolvableObject and therefore have a Resolve method.
  • An instance of List (a list ofthe desired representation levels to output)
  • An instance of DocumentObjectSet i.e., a set of references to resolved DocumentObjects, with one set member for each desired level of representation of this Variation instance
  • Toplevel Procedures These are the toplevel procedures used to read in a datafile containing a master document, create instances ofthe document-class objects, then loop to read in new values ofthe purpose parameters and generate an appropriately customized version ofthe document, with all appropriate annotations, at each level of representation as specified by the representation-level parameters given in the datafile.
  • the line is a comment or a blank line then skip over it else if the line specifies the toplevel DocumentObj ect then set the "toplevel” variable to reference this object else if the line specifies the purpose parameters and their values then set the "purposeParameters” variable else if the line specifies the possible representation levels then set the "repLevels” variable else instantiate the specified document object
  • Algorithm call an application-specific interface to:

Abstract

A method and apparatus for generating customizable documents comprising a datafile including a data structure for defining relationships between elements of a document and variations thereof; a parser for reading the datafile and for creating instances of document-class data structures in accordance with general document class definitions; a user input interface for inputting purpose parameters specifying a document variation; a selection engine for utilizing the current values of the purpose parameters for generating customized versions of said document.

Description

A Method and Apparatus for Authoring of Customizable Multimedia Documents
This invention relates to a method and apparatus for the authoring of customizable multimedia documents and the adaptive generation of versions thereof for particular uses.
Background of the Invention
Natural Language Generation (NLG) is a young but growing research field, whose goal is to build computer systems that automatically produce fluent and effective texts in various human languages. Generally, NLG systems have used knowledge databases containing general world knowledge and specific domain knowledge, together with various linguistic resources (e.g., lexicons, grammars, discourse relations), to produce texts with limited variation in word choice, sentence and discourse structure, and virtually no variation in rhetorical style or pragmatic purpose.
While various computational systems have been devised as solutions to the problem of producing documents with limited expressiveness in form and effect, none has presented a general solution to the problem of representing the kinds of knowledge that are needed to produce documents tailored to a specific use or audience in a manner that is systematic and extensible, and that further provides for the authoring of such documents by a non-computer-programmer professional writer. In addition, no system has yet presented a general solution for automatically integrating various aspects of document design (e.g., text, graphics, and presentation layout) into a single consistent representation format for use within a document intended for customization.
It is well-known from studies in communication that presentation of information in a manner that is tailored to the characteristics of a particular audience can be significant both in maintaining the interest ofthe members ofthe audience and in effectively conveying the meaning ofthe information. For example, in the health care industry, it has been shown that information that is tailored to the characteristics of an individual patient can have a far greater effect in producing compliance with suggested medical regimens as compared to generic information. (Strecher et al 1994 have done pioneering work in this area). But as Strecher et al's behavioural studies also showed, a sizeable number of different medical and personality factors had to be taken into account in producing customized health information that would have the desired effect on the intended patient. DiMarco et al (1995) noted that this kind of customization involves much more than producing each brochure or leaflet in half a dozen different versions for different audiences. Rather, the number of different combinations could easily be in the tens of thousands. While not all distinct combinations might need distinct customizations, it is nonetheless impossible in general to produce and distribute, in advance of need, the large number of different editions of each publication that is required for individual tailoring of information.
Thus there is a need for a computer system for the automated production of customized material that would tailor a general-purpose "master document" for a particular purpose or individual on demand. It must also be remembered that in the present context the term "document" is broadly used to define any textual or non- textual data, including multimedia and hypertext, having inter-relationships between the data, and that may be displayable or presented to a human audience in one of many presentations and formats.
As a further example, a master document may refer to the complete superset of instructions to direct the actions of a robot on an assembly line. In this instance, there exists a need to tailor or adaptively generate subsets of combinations of instructions for specific robot applications. Whether the master document is to be customized for a particular purpose, as in the robot example, or tailored for a specific audience, as in the case of health information, this process of adaptive document generation should be easily implementable on a computer system at minimum possible cost and maximum possible ease of use to both the author ofthe master document and the user ofthe generation system.
In the field of natural language processing, or, computational linguistics, various computer systems have been implemented which attempt to produce customized documents. In the simplest cases, simple mail-merge techniques are used which enable "personalized" documents to be generated by using hand-coded decision rules indicating what information is to be included for various tailoring situations. However, these techniques result in very inflexible, and often, awkwardly structured, and poorly cohesive texts. Other systems utilize schema-based techniques to select and organize the content data according to simple document-template structures. But these templates are either too general-purpose to provide anything more than very coarse-grained adaptation in the resulting customized texts or too specific to the application in question to be appropriate for general use in adaptive generation systems. A number of projects have used more sophisticated techniques from NLG research to build adaptive generation systems for both written texts and hypertext documents. The IDAS project (Reiter, Mellish, and Levine 1995) recognized the need to tailor both textual and non-textual information, including visual formatting, hypertext input, and graphics output. IDAS also tried to address the need for explicit authoring tools in the adaptive document generation process, but here the focus was on authoring at the knowledge-base level (i.e., at the level of a computer system's internal representation), while there still exists a need to provide an authoring tool that may be used by a non-computer-programmer professional writer who could compose the master document at the level of ordinary English, with additional markup as required (e.g., HTML markup to support an HTML presentation format for a resulting customized version ofthe document). IDAS relies mainly on canned texts and aims to provide the user with a means of navigating through the whole "hyperspace" of possible (canned) texts. There is however a need to provide for a much finer-grained degree of tailoring than the IDAS implementation. While IDAS relies mainly on canned texts, other adaptive generation systems do use more-dynamic text generation: the Migraine system (Carenini, Mittal, and Moore 1994) uses an approach to text planning that adaptively selects and structures the information to be given to a particular reader. However, Migraine relies on a large number of context-sensitive and user-sensitive "text plans" (i.e., text schemas) so that its methods of tailoring must of necessity be very specific to its particular domain. The PEBA-II system (Milosavljevic and Dale 1996) uses more-general text plans, as well as text templates, that it can choose from to adapt information to the individual reader, but the tailoring done is very specific, focussing on the user's familiarity with a topic. The PIGLET system (Cawsey, Binsted, and Jones 1995) also uses a combination of text plans and text templates, but its tailoring is also quite specific in nature, mainly concerned with emphasizing material that is relevant to a particular patient. The ILEX-0 system (Knott, Mellish, Oberlander, and O'Donnell 1996) is similar to the PIGLET model in its anticipation of all the possible texts that might be generated, but also includes annotations (e.g., a condition on a piece of canned text) to allow some local customization. However, very free and flexible use of annotations could lead to problems of repetitive text and inappropriate use of referring expressions in the resulting document, requiring textual repair.
None of the previous systems provide for a text-repair facility ofthe kind described by Hovy and Wanner (1996) and Wanner and Hovy (1996). The paradigm of adaptive document generation by "selection and repair", as introduced by DiMarco, Hirst, Wilkinson, and Wanner (1995), that is, selection ofthe relevant pieces of information from a master document, and then repair of any syntactic or stylistic problems in the resulting document by a text-repair facility, is central to the goals of a customizable document system. However, the system should be able to support either an adaptive generation system with full facilities for selecting and repairing texts, as described by DiMarco, Hirst, Wilkinson, and Wanner (1995) and Hirst, DiMarco, Hovy, and Parsons (1997), or a simpler version ofthe system, based on "generation by selection only", i.e., with no facilities for textual repair, an implementation of which (called "WebbeDoc") is described by DiMarco and Foster (1997).
In summary, an author of a customizable document needs to be able to describe the variations of a document, which may be both textual and non-textual, at various levels ofthe document structure, together with the conditions for selecting each variation. The author then needs a means of selecting all the appropriate variations for a particular purpose or audience, re-assembling the selected variations into a coherent document, and producing an appropriately customized version ofthe document, in potentially many different levels of representation (e.g., surface English, a deep syntactic or semantic representation for use in textual repair) and presentation formats (e.g., HTML, LaTeX).
None ofthe existing adaptive document generation systems has provided a generally applicable method and apparatus for describing all the different ways in which a document could be customized, or for providing for a non-computer- programmer author of a customizable document to specify the possible variations, or for selecting the appropriate variations and producing a customized version of a document. Summary of the Invention
It is therefore an object of this invention to provide a system which mitigates to some extent the above outlined disadvantages. Also, the methods used by the present invention are more general than those used in previous systems, allowing not only the potential inclusion of text plans and schemas, text templates, and canned text, but also the dynamic generation of text that can then be subjected to very fine-grained revision and tailoring by a text-repair facility.
This invention seeks to provide a computer system for customizing an initial master document containing information for a multiplicity of versions ofthe document intended for the different purposes or different users, for a specific purpose or for a specific user.
In accordance with this invention, there is provided a computer system for customizing an initial master document in accordance with a user-defined set of purpose parameters. In accordance with a further aspect ofthe invention there is provided a data structure (i.e., the customizable, "master", document) for specifying relationships between elements ofthe document and between elements ofthe document and their variations.
A master document will therefore contain all the information that the system might need to include in any particular customized version ofthe document, together with annotations giving the selection conditions as to when each piece of information is relevant and other annotations giving linguistic and formatting information, including for multimedia elements ofthe document.
A further aspect ofthe invention provides for a method and apparatus for reading said data structure into a form implementable on a suitably programmed processor such that the implementation ofthe data structure can store both the form and content of a master document, i.e., all the elements ofthe document and their variations, and can also act as the process for selecting the relevant variations ofthe document, according to given values of input parameters specifying the intended purpose or intended user, and then generating the appropriately customized version of the document.
Brief Description of the Drawings A better understanding ofthe invention will be obtained by reference to the detailed description of a preferred embodiment below and in regard to the following drawings in which:
Figure 1 is a schematic diagram showing the architecture ofthe system; Figure 2 shows a generalised form of a data structure for specifying a customizable, master, document according to an embodiment ofthe present invention;
Figure 3(a) shows a generalised form of a data substructure for use within the main data structure for specifying linguistic or presentation format information for a component of the master document; Figure 3(b) shows a generalised form of a data substructure for use within the main data structure for specifying hypertext links to parts ofthe main data structure or to other data structures ofthe form as specified in Figure 2 and provided in other source datafiles;
Figure 4 is a flowchart ofthe overall process of reading-in as input the data structure and generating as output a final customized version ofthe document; and
Figure 5 is a graph showing the resolution process for a customized document generated according to an embodiment ofthe present invention;
Detailed Description of A Preferred Embodiment Referring to Figure 1 , an architecture of a customizable document system, the
"tailoring engine", is shown generally by numeral 10. The system comprises: a datafile 12 including a data structure 13 for defining a master document; a parser 14 for reading in the contents ofthe datafile 12 and for creating instances ofthe document-class data structures 18 in accordance with the general definitions of document-class data structures 16; a user input interface 20 for reading new values of purpose parameters 22 which are input by a selection engine 18. The selection engine 18 uses the current values ofthe purpose parameters to select the relevant variations of each component ofthe document and to generate appropriately customized versions 26 of a document, which may also include hypertext links to new documents 28, which may themselves be customizable documents ofthe form illustrated in Figure 2.
The term "purpose parameter" as used herein means a parameter used in evaluating a selection condition associated with a particular variation of a document structure, where this parameter can be used in defining either a particular intended purpose or use of a customized version ofthe document or a particular intended user or audience for a customized version. These elements are explained in detail below. Furthermore, the datafile may be generated by an authoring tool 30 in accordance with the detailed explanation below.
Referring to figure 2 an embodiment ofthe data structure 13 according to the present invention is described, which shows the blocks, sub-blocks, fields, and subfields ofthe data structure.
The data structure 13 has the following main blocks of information: 1. Identification of purpose parameters and representation-level parameters, and their possible values.
2. Identification of toplevel object (i.e., the main Document).
3. Definitions of main Document and any subDocuments.
4. Definitions of Sections. 5. Definitions of Topics.
6. Definitions of Sentences.
7. Definitions of Lexicals.
8. Definitions of Words .
9. Definitions of Annotations (replaces previous 9. Definitions of Formats). 10. Definitions of External objects.
Each of these blocks ofthe data structure will now be described in turn.
Block 1: The purpose parameters. The first block ofthe data structure identifies the purpose parameters, or user parameters, together with their possible values, that can be used in forming the Boolean expressions that give the conditions for selecting each variation of a document.
The first block also identifies the representation-level parameters, together with their possible values, that can be used in forming the Boolean expressions that give the conditions for selecting each desired level of representation ofthe sentences in the master document during the process of generating a customized version ofthe document. Block 2: The toplevel object. The toplevel object identifies the document-class instance which is the "root" element ofthe entire document. It is with this object that the resolution process for generating a customized version ofthe document begins.
Blocks 3-9: The program structures. Blocks 3-9 describe the program structures, the classes that implement the substructures ofthe data structure that specify the form and content of a customizable document. The program structures are related in the following manner:
A datafile describing a set of program structures is a particular example of a customizable document created for various uses. The datafile may be divided into various parts.
Firstly, the data structure may be divided into components, or elements, referred to as the classes Document, Section, Topic, Sentence, and Lexical, which each implement a substructure ofthe data structure that defines a component of a customizable document. Each such substructure ofthe data structure also includes the variations of a component and the conditions for selecting the appropriate variation of a component.
In addition, the data structure contains basic components which are instances ofthe classes Word and Annotation, and other components, which are instances ofthe class External, for linking to other datafiles.
Block 3: The Documents.
In Figure 2, Block 3 describes the instances ofthe class Document. Each Document description must specify the following properties: - A list of its variations. Each variation must be an instance of the class
DocumentVariation. - A list of its annotations. Each annotation must be an instance of the class
Annotation. An instance of the class Annotation can specify both textual and non-textual properties of a document or a component of a document in terms of a particular document-layout format, structure, or linguistic representation. For example, an Annotation object could specify multimedia elements ofthe document's layout, such as alignment of text, font size, background colour, text colour, and graphics; other Annotation objects could specify linguistic information such as discourse relations or coreference links. Each variation of a Document class is then described as an instance ofthe class DocumentVariation. Each DocumentVariation description specifies the following properties:
The condition for selecting this variation. The condition must be a Boolean expression composed from pairs of purpose parameters and their allowable values.
A list of the components of this variation. Each component must either be an instance ofthe class Document or an instance ofthe class Section.
A list of annotations for this variation. Each annotation must be an instance ofthe class Annotation.
Block 4: The Sections. Block 4 describes the instances of the class Section. Each Section description specifies the following properties:
- A list of its variations. Each variation must be an instance ofthe class SectionVariation.
A list of its annotations. Each annotation must be an instance of the class Annotation. Each variation of a Section class is then described as an instance ofthe class SectionVariation. Each SectionVariation description specifies the following properties:
The condition for selecting this variation. The condition must be a Boolean expression composed from pairs of purpose parameters and their allowable values.
- A list ofthe components of this variation. Each component must either be an instance ofthe class Section or an instance ofthe class Topic.
A list of the annotations for this variation. Each annotation must be an instance ofthe class Annotation.
Block 5: The Topics. Block 5 describes the instances ofthe class Topic. Each Topic description specifies the following properties:
A list of its variations. Each variation must be an instance ofthe class
Topic-Variation. A list of its annotations. Each annotation must be an instance ofthe class
Annotation. Each variation of a Topic class is then described as an instance ofthe class TopicVariation. Each TopicVariation description specifies the following properties: - The condition for selecting this variation. The condition must be a
Boolean expression composed from pairs of purpose parameters and their allowable values.
A list of the components of this variation. Each component must either be an instance ofthe class Topic or an instance ofthe class Sentence. - A list ofthe annotations for this variation. Each annotation must be an instance of the class Annotation.
Block 6: The Sentences. Block 6 describes the instances ofthe class Sentence. Each Sentence description must specify the following properties: - A list of its variations. Each variation must be an instance ofthe class
SentenceVariation.
- A list of its annotations. Each annotation must be an instance of the class Annotation,
Each variation of a Sentence class is then described as an instance ofthe class Sentence-Variation. Each SentenceVariation description must specify the following properties:
- The condition for selecting this variation. The condition must be a Boolean expression composed from pairs of purpose parameters and their allowable values. - A list ofthe components of this variation.
- A list ofthe representations of this variation. Each representation must be an instance of the class SentenceRepLevel.
- A list of annotations for this variation. Each annotation must be an instance ofthe class Annotation. Each SentenceRepLevel description must specify the following properties:
- The condition for selecting this sentence representation. The condition must be a list of one or more representation-level parameters.
- A list ofthe components of this variation. Each component must be an instance of the class Lexical. A list of the annotations for this variation. Each annotation must be an instance ofthe class Annotation.
In the current implementation ofthe system, an instance ofthe class SentenceRepLevel may be a character string, with any Lexical components identified by surrounding reserved characters. This is a simplification made for ease of testing system prototypes, and does not limit the scope ofthe invention.
Block 7: The Lexicals. Block 7 describes the instances of the class Lexical. Each Lexical description must specify the following property:
A list of its variations. Each variation must be an instance of the class LexicalVariation.
- A list of its annotations. Each annotation must be an instance ofthe class Annotation.
Each variation of a Lexical class is then described as an instance ofthe class LexicalVariation. Each LexicalVariation description must specify the following properties:
The condition for selecting this variation. The condition must be a Boolean expression composed from pairs of purpose parameters and their allowable values.
A character string associated with this LexicalVariation instance.
- A list of annotations for this variation. Each annotation must be an instance ofthe class Annotation. - A single component, which may be an instance of any ofthe classes
External, Word, Lexical, Sentence, Topic, Section, or Document. Each of these cases is dealt with as follows: 1. If a component is a Word, then it is treated as a simple string to be concatenated to the result returned. 2. If a component is a Lexical, then it is treated as a set of variations of a word, which will be resolved to select the appropriate version ofthe word.
In this way, near-synonymy can be handled within the system. 3. If a component is a Sentence, Topic, or Section, then it is treated as a set of variations of a separate piece of a document, which will be resolved to select the appropriate version.
In this way, adaptive hypertext can be handled within the system.
4. If a component is a Document, then it is treated as a whole complete document.
In this way, hypertext links to other documents internal to the same datafile can be handled within the system.
5. If a component is an External object, then it is treated as a whole complete customizable document. In this way, hypertext links to other customizable documents can be handled within the system.
Block 8: The Words. Block 8 describes the instances ofthe class Word. Each Word description must specify its associated string and its associated annotations. Each annotation must be an instance ofthe class Annotation.
Block 9: The Annotations. Block 9 describes the instances ofthe class Annotation that will be used to insert all the relevant linguistic and formatting information into the customized version ofthe document to be output. A description of block 9 is not included in figure 2. Instead, the general structure of an Annotation class object is shown in figure 3(a).
The Annotation objects can be grouped into several distinct sub-taxonomies, one for each type of linguistic or formatting annotations that will be attached to the main master-document data structure. For example, one Annotation sub-taxonomy might specify details of HTML layout for the overall document and each component ofthe document; another Annotation sub-taxonomy might record properties ofthe discourse structure ofthe overall document and each component ofthe document, such as rhetorical relations and coreference links. Each Annotation object has a property "parent" to reference its immediate ancestor in its (sub-)taxonomy.
Block 10: The Externals. Block 10 describes the instances ofthe class External , which will be used to create hypertext links to other customizable documents specified in other datafiles. Each External description must specify the following attributes: The name ofthe file containing the external customizable document. A user profile. The user profile is a list of parameters that describe the user or audience for whom the document customization is being performed. The general structure of an External class object is shown in figure 3(b).
The data structure 13 according to the present invention allows an author to describe the structure of a customizable document (i.e., a master document). The data structure has a recursive and object-oriented form and can be implemented using an object-oriented programming language so that a customizable document described in the form ofthe data structure can be implemented as an object-oriented computer program.
The elements ofthe data structure are related by both part relationships and by inheritance relationships, so that the relationships between the elements of a customizable document described in the form ofthe data structure and implemented as a corresponding object-oriented computer program can be recognized and maintained by the object hierarchy and inheritance mechanism of an object-oriented computer program.
Moreover, an object-oriented computer program that implements the data structure is both the form and content of a customizable document and the process for selecting and generating an appropriately customized version ofthe document. Thus the data structure is generic in the sense that it can implement any customizable document given in the form ofthe data structure.
The data structure 13 describes the corpus text of a specific customizable document (i.e., a master document) having elements and structure, as shown in figures 2(a) , 2(b) and 2(c) following.
The form ofthe data structure is defined by the set of general class objects 16 describing the elements and structure of a customizable document in terms of object- oriented program structures. These class objects are related by both part relationships and by inheritance relationships to be explained below. The operation ofthe system 10 (the "tailoring engine") may be explained as follows:
The parser program 14 reads in and parses the master-document datafile 12 to recognize its structure and then maps the contents ofthe input datafile into class objects of name according to class names specified in the input data structure. Properties ofthe classes are also recognized according to information contained in the data structure.
The parser program 14 also acts as a document-class instantiator program which uses the parsed contents ofthe input datafile to dynamically create instances of the program structures identified by the general class objects described above. Properties ofthe classes are also dynamically assigned according to information contained in the data structure.
An integration of data structures with the main process ofthe system comprises the following: The instances ofthe class objects dynamically generated from the input datafile, which provide two simultaneous functions:
1. The program data structures describing the elements, structure, and content ofthe master document.
2. The selection process for generating the appropriate customized version of the document.
Thus, a feature ofthe system is that given the current values ofthe purpose parameters, the instances ofthe program data structures, that is to say, the instances of the class objects describing the elements ofthe master document, execute themselves to select and generate the appropriate customized version ofthe document. The core ofthe system, that is, the integration of program data structures with the selection process, is generic in terms ofthe following properties:
Application-independent. The system core is independent ofthe application: the only items that need be re-defined for a new application are the input datafile and the interface for reading the current values ofthe purpose parameters. This is discussed further below.
Platform-independent. The system core is currently implemented in the Java programming language, so is platform-independent to the same extent as Java itself.
Processor-independent. The system core is currently implemented in the Java programming language, but is independent ofthe underlying programming language, i.e., processor, to the extent that the programming language used must provide an object-oriented paradigm and a semantics for property inheritance that is consistent with the specification ofthe resolution process used in the system for generating a customized version of a document from the instances ofthe program data structures. The process of generating a customized version of a document from the instances ofthe program structures is referred to as resolution and will be discussed later with reference to Figure 5.
A customized version of a document can be generated in any number of different levels of representation of its content (e.g., surface English; a syntactic or semantic representation to be used by a text-repair facility; and so on). The different representations to be generated for any given application must be indicated by the representation-level parameters in the description ofthe document given by the data structure in the input datafile. This information is specified in block 1 ofthe data structure as described with reference to figures 2(a), 2(b), and 2(c) above.
Each different representation ofthe content of a customized version ofthe document will be generated along with a list of all the relevant annotations to the content. These annotations provide information on the multiple forms in which the document may subsequently be presented to a reader, and the linguistic information that may be used to guide subsequent repair ofthe customized document. The customized document that is generated will also include all annotations to the content concerning the External objects that can be used to provide hypertext links to other customizable documents or to other applications ofthe system 10 (the "tailoring engine") from figure 1. The process of generating a customized version of a master document is shown in Figure 4. There are two main stages to this process, the initial setup and the main program loop. In the initial setup, the input datafile is read in and parsed, and the appropriate instances ofthe various document classes described above are created. The parser program 14 reads in the input datafile 12, which contains the data structure 13 giving the specification of a customizable document. As it reads in the file, the driver program also acts as a class instantiator to create instances of all relevant document classes according to the input data structure. Program links, i.e., references, are created between these class instances via the setting of their properties and assignment of their property values. In each iteration ofthe main program loop, new values ofthe purpose parameters are read in, and a customized version ofthe document is generated as output for each specified level of representation. A user interface in the form of a reader program 20 obtains the new values ofthe purpose parameters. In the latter instance, the parameter values may be entered interactively or may be read in from previously compiled profiles of user preferences stored in computer databases. This allows for the mass customization of information for high volumes of individual users with diverse characteristics, such as in the mass production of personalised financial investment advice. A selection engine 18 resolves the document instances created in the setup stage according to the current values ofthe purpose parameters to generate the appropriately customized version ofthe document. The customized document is output in all specified levels of representation with all relevant linguistic and formatting information attached to each component of the document If there are no more new purpose-parameter values to read in, the main program loop terminates.
The process of selecting the appropriate variation of each document structure is called "resolution". Figure 5 is a graph showing the resolution process for a customized document generated according to an embodiment ofthe present invention. The pseudocode for the Resolve procedures, which implement the resolution process, is as follows:
Procedure: Resolve (for VariationContainer classes)
Parameters
- (input)
An instance of WorkingCondition (a list of purpose parameters and their current values)
- (input)
An instance of List
(a list ofthe desired representation levels to output)
- (output) An instance of DocumentObjectSet, i.e., a set of references to resolved DocumentObjects, with one set member for each desired level of representation of this VariationContainer instance
Algorithm: if Resolve has already been called with this WorkingCondition then result is the previously saved result else call Satisfies on each variation of this Variation Container until the WorkingCondition is satisfied call Resolve on this satisfying variation to return a DocumentObjectSet add the appropriate annotations for this VariationContainer into the returned DocumentObjectSet end if if no variation satisfies the WorkingCondition then result is null else result is the annotated DocumentObjectSet for this VariationContainer end if
Procedure: Resolve (for Variation classes) Parameters: - (input)
An instance of WorkingCondition - (input) An instance of List
(a list ofthe desired representation levels to output) - (output)
An instance of DocumentObjectSet, i.e., a set of references to resolved DocumentObjects, with one set member for each desired level of representation of this Variation instance
Algorithm: create a new DocumentObjectSet for this Variation for each component of this Variation do call Resolve on the current component to return a DocumentObjectSet for this component add the appropriate annotations for this Variation into the returned DocumentObjectSet attach the annotated DocumentObjectSet as a child ofthe DocumentObjectSet for this Variation end for result is the annotated DocumentObjectSet for this Variation
The "resolution" ofthe appropriate annotations for a given input data structure is handled as follows:
Each instance of a document class in the input data structure, down to the level of a Word instance, has an "annotations" property, which may be null, associated with it. This property is a list of all the Annotation objects that apply to this document-class instance and must contain one Annotation object for each ofthe distinct sub- taxonomies in the overall class of Annotation objects.
When a document-class instance is being resolved, its set of Annotation objects, as specified in its "annotations" property, will be collected and included in the content ofthe customized document.
In summary, an application system which wishes to put the invention into practice must execute the following steps:
1. An input datafile ofthe structure as described with reference to figures 2(a), 2(b), and 2(c) is created;
2. A user interface in the form of a reader program to obtain new values of purpose parameters is provided. This reader program returns as output an instance of WorkingCondition which is a set of purpose parameters and their values (implemented as a hashtable in one embodiment ofthe invention).
3. The main program first reads in the datafile, then calls the reader program to obtain a new instance of WorkingCondition. The main program will then start the resolution process for the toplevel Document object by passing it the WorkingCondition.
4. The output of each iteration ofthe main program will be a customized version ofthe document in all the levels of representation specified in the input datafile using the representation-level parameters. Each representation ofthe customized document is output for possible later processing by the application system. References
Giuseppe Carenini, Vibhu O. Mittal, and Johanna D.Moore. "Generating patient- specific interactive natural language explanations." Proceedings, Eighteenth Annual Symposium on Computer Applications in Medical Care, Washington D.C., November 1994, 5-9.
Alison Cawsey, Kim Binsted, and Ray Jones. "Personalised explanations for patient education." Proceedings ofthe Fifth European Workshop on Natural Language Generation, 1995, 59-74.
Chrysanne DiMarco and Mary Ellen Foster. "The automated generation of Web documents that are tailored to the individual reader." Proceeding ofthe 1997 AAAI Spring Symposium on Natural Language Processing for the World Wide Web, Stanford University, March 1997.
Chrysanne DiMarco, Graeme Hirst, and Eduard Hovy. "Generation by selection and repair as a method for adapting text for the individual reader." Proceedings ofthe Workshop on Flexible Hypertext, Eighth ACM International Hypertext Conference, Southampton UK, April 1997.
Chrysanne DiMarco, Graeme Hirst, Leo Wanner, and John Wilkinson. "HealthDoc: Customizing patient information and health education by medical condition and personal characteristics." Workshop on Artificial Intelligence in Patient Education, Glasgow, August 1995.
Graeme Hirst, Chrysanne DiMarco, Eduard Hovy, and Kimberley Parsons. "Authoring and generating health-education documents that are tailored to the needs ofthe individualpatient." In: Anthony Jameson, Cecile Paris, and Carlo Tasso (editors), User Modeling: Proceedings ofthe Sixth International Conference, UM97 (Chia Laguna, Sardinia, Italy),Vienna and New York: Springer Wien New York, June 1997, 107-118.
Eduard Hovy and Leo Wanner. "Managing sentence planning requirements." Proceedings, ECAI-96 Workshop on Gaps and Bridges: New Directions in Planning and Natural Language Generation, Budapest, August 1996.
Alistair Knott, Chris Mellish, Jon Oberlander, and Mick O'Donnell. "Sources of flexibility in dynamic hypertext generation" Proceedings, Eighth International Natural Language Generation Workshop, Herstmonceaux Castle, June 1996, 151-160.
Maria Milosavljevic and Robert Dale. "Strategies for comparison in encyclopaedia descriptions." Proceedings, Eighth International Natural Language Generation Workshop, Herstmonceaux Castle, UK, June 1996, 161-170.
Ehud Reiter, Chris Mellish, and John Levine. "Automatic generation of technical documentation." Applied Artificial Intelligence, 9, 1995, 259-287.
Victor J. Strecher, Matthew Kreuter, Dirk- Jan Den Boer, Sarah Kobrin, Harm J. Hospers, and Celette S. Skinner. "The effects of computer-tailored smoking cessation messages in family practice settings." The Journal of Family Practice, 39(3), September 1994, 262-270.
Leo Wanner and Eduard Hovy. "The HealthDoc sentence planner." Proceedings of the Eighth International Workshop on Natural Language Generation, Brighton, UK, June 1996.
Sample Data Structures
Selections from two sample data structures are presented in this section. The first example is for the customizable home page ofthe HealthDoc Project at the University of Waterloo (Waterloo, Canada). The second example is for a master document giving basic health information on diabetes.
Note that in the LexicalVariation labelled "lexDiabetesMaster-a" in Example 1 , there is a link to another completely separate customizable document, contained in a different input datafile. This link is made through an instance ofthe External class.
Example 1: A Customizable Web Page
// The parameters PurposeParameters
|role=CLexpert physician layperson funder& technical =high low& age=senior adult child& formality=formal informal& coolness=cool bland|
RepresentationLevelParameters |levels=english|
// The top-level object toplevel=Document.webbedoc // The Documents and DocumentVariations Document webbedoc |title="The HealthDoc Project Home Page"& variations=doc-a doc-b doc-c doc-d doc-e doc-f doc-g doc-h& annotations=html-doc-webbedoc-toplevel|
DocumentVariation doc-a |condition=(and (coolness cool) (age adult) (role CLexpert))& componenfList=Section.secl Section.sec2 Section.sec3 Section.sec4 Section.sec5 Section.secό Section.sec7& annotations==html-doc-webbedoc-doc-a|
DocumentVariation doc-d |condition=(and (coolness bland) (age adult))& componentList=Section.secl Section.sec2 Section.sec3 Section.sec4 Section.sec5 Section.secό Section.sec7& annotations=html-doc-webbedoc-doc-d|
// The Sections and SectionVariations Section seel I variations=sec la seclb seclc secld secle seclf seclg seclh secli seclj seclk secll seclm secln seclo seclp seclq seclr& annotations=html-sec-webbedoc-sec 11 SectionVariation sec 1 a |condition=
(and (role funder)(technical all)(coolness bland) (formality formal))& componentList=Section.subsec 1 - 1 Section, subsec 1 -2& annotations=html-sec- webbedoc-sec 1 a|
Section subsecl-1 I variations=subsec 1 - 1 a& annotations=html-sec-webbedoc-subsec 1 - 11 SectionVariation subsec 1 - 1 a |condition=()& componentList=Topic.topicl& annotations=html-sec-webbedoc-subsec 1 - 11 Section sec2
|variations=sec2a sec2b& annotations=html-sec-webbedoc-sec2|
Section subsec2-l |variations=subsec2-la& annotations=html-sec-webbedoc-subsec2- 11
SectionVariation subsec2-la |condition=()& componentList=Topic.topic4 Topic.topic5 Topic.topicό Topic.topic7& annotations=html-sec- webbedoc-subsec2- 11
// The Topics and TopicVariations Topic topic4 |variations=topic4a& annotations=html-topic-webbedoc-default|
TopicVariation topic4a |condition=()& componentList=Sentence.sent4a-l Sentence. sent4a-2& annotations=html-topic-webbedoc-default|
Topic topic5 |variations=topic5a topic5b topic5c topic5d& annotations=html-topic-webbedoc-default|
TopicVariation topic5a |condition=(and (role physician) (technical high))& componentList=Sentence.sent5a- 1 & annotations=html-topic-webbedoc-default|
TopicVariation topic5c |condition=(and (not (role physician)) (technical high))& componentList=Sentence.sent5c-l & annotations=html-topic-webbedoc-default|
Topic topic7 |variations=topic7a topic7b& annotations=html -topic- webbedoc-default|
TopicVariation topic7a |condition=(technical low)& componentList=Sentence.sent7a- 1 & annotations=html-topic-webbedoc-default|
TopicVariation topic7b |condition=(technical high)& componenfList=Sentence.sent7b-l& annotations=html-topic-webbedoc-default|
Topic topic-compliance I variations=topic-compliance-a& annotations=html-topic-webbedoc-default|
TopicVariation topic-compliance-a |condition=()& componentList=Sentence.sent-compliance-l Sentence. sent-compliance-2 Sentence.sent-compliance-3 Sentence. sent-compliance-4& annotations=html-topic-webbedoc-default|
// The Sentences and SentenceVariations Sentence sent4a-l |variations=sent4a-li& annotations=html-sent-webbedoc-default|
SentenceVariation sent4a-li |condition=()& componentList=()& levelList:=SentenceRepLevel.sent4a- 1 i-english& annotations=html-sent-webbedoc-default|
SentenceRepLevel sent4a- 1 i-english |repLevel=english& componentList="Why do we want to be able to produce tailored documents?"& annotations=html-sent-webbedoc-default| Sentence sent4a-2 |variations=sent4a-2i& annotations=html-sent-webbedoc-default| SentenceVariation sent4a-2i |condition=()& componentList=()& levelList=SentenceRepLevel.sent4a-2i-english& annotations=html-sent-webbedoc-default|
SentenceRepLevel sent4a-2i-english |repLevel=english& componentList="Because research in communication has shown that people pay more attention to messages that are aimed just at them."& annotations=html-sent-webbedoc-default|
Sentence sent5a-l I variations=sent5a- 1 i& annotations=html-sent-webbedoc-default|
SentenceVariation sent5a-li |condition=()& componentList=()& levelList=SentenceRepLevel.sent5a-li-english& annotations=html-sent-webbedoc-default|
SentenceRepLevel sent5a- 1 i-english |repLevel=english& componenfList=" Studies have shown that health information that is tailored to a patient's specific medical condition and personal characteristics is much more effective than generic information in influencing ΛlexComplianceΛ and subsequent outcome. "& annotations=html-sent-webbedoc-default|
Sentence sent7a-l [variations^ sent7a-li& annotations=html-sent-webbedoc-default| SentenceVariation sent7a- 1 i
|condition=()& componentList-()& levelList-SentenceRepLevel.sent7a-li-english& annotations=html-sent-webbedoc-default|
SentenceRepLevel sent7a- 1 i-english |repLevel=english& componenfList="But it can be very difficult to write and keep track of many versions ofthe same ΛlexSynonymslΛ."& annotations=html-sent-webbedoc-default| Sentence sent8a-l
|variations=sent8a-li& annotations=html-sent-webbedoc-default|
SentenceVariation sent8a-li |condition=()& componentList=()& levelList=SentenceRepLevel.sent8a-li-english& annotations=html-sent-webbedoc-default| SentenceRepLevel sent8a-l i-english |repLevel=english& componentList="What is needed is a computer system for the production of tailored health-information and patient-education documents, that would, on demand, customize a 'master document' to the needs of a particular individual. "& annotations=html-sent-webbedoc-default|
Sentence sent8a-2 |variations=sent8a-2i& annotations=html-sent-webbedoc-default|
SentenceVariation sent8a-2i |condition=()& componentList=()& levelList=SentenceRepLevel.sent8a-2i-english& annotations=html-sent-webbedoc-default|
SentenceRepLevel sent8a-2i-english |repLevel=english& componentList="The HealthDoc project has currently built the first
ΛlexDiabetesMasterΛ of such a system."& annotations=html-sent-webbedoc-default| Sentence sent-compliance-1
I variations=sent-compliance- 1 a& annotations-html-sent-webbedoc-defaultl
SentenceVariation sent-compliance-1 a |condition=()& componentList=()& levelList=SentenceRepLevel.sent-compliance- 1 a-english& annotations=html-sent-webbedoc-default| SentenceRepLevel sent-compliance-1 a-english
|repLevel=english& componentList="Recent experiments have shown that health-education material can be much more effective if it is customized for the individual reader in accordance with their medical conditions, demographic variables, personality profile, or other relevant factors."& annotations=html-sent-webbedoc-default|
Sentence sent-compliance-2 |variations=sent-compliance-2a& annotations=html-sent-webbedoc-default| SentenceVariation sent-compliance-2a
|condition=()& componentList=()& levelList=SentenceRepLevel.sent-compliance-2a-english& annotations=html-sent-webbedoc-default|
SentenceRepLevel sent-compliance-2a-english |repLevel=english& componentList="For example, Dr Victor Strecher (now at the Comprehensive Cancer Center ofthe University of Michigan) and colleagues sent unsolicited leaflets to patients of family practices on topics such as giving up smoking, improving dietary behaviour, or having a mammogram."& annotations=html-sent-webbedoc-default|
Sentence sent-compliance-3 |variations=sent-compliance-3a& annotations=html-sent-webbedoc-default|
SentenceVariation sent-compliance-3a |condition=()& componentList=()& levelList=SentenceRepLevel.sent-compliance-3a-english& annotations=html-sent-webbedoc-default|
SentenceRepLevel sent-compliance-3 a-english |repLevel=english& componentList="Each leaflet was 'tailored' to the recipient, on the basis of data gathered from them in an earlier survey."& annotations=html-sent-webbedoc-default|
Sentence sent-compliance-4 |variations=sent-compliance-4a& annotations=html-sent-webbedoc-default| S entence V ariation sent-compliance-4a |condition=()& componenfList=()& levelList=SentenceRepLevel.sent-compliance-4a-english& annotations=html-sent-webbedoc-default|
SentenceRepLevel sent-compliance-4a-english |repLevel=english& componentList="In each study, the 'tailored' leaflets were found to have a significantly greater effect on the patients' behaviour than 'generic' leaflets had upon patients in a control group."& annotations=html-sent-webbedoc-default|
// The Lexicals and LexicalVariations Lexical lexSynonymsl I variations=lexSynonyms 1 a lexSynonyms 1 b& annotations^
LexicalVariation lexSynonyms la |condition=(role physician)& string="brochure"& value=Word.brochure& annotations=|
LexicalVariation lexSynonyms lb |condition=(not (role physician))& string="information"& value=Word.information& annotations^!
Lexical lexCompliance |variations=lexCompliance-a& annotations=|
LexicalVariation lexCompliance-a |condition=(role physician)& string="tailored health-information"& value-Topic.topic-compliance& annotations=|
Lexical lexDiabetesMaster |variations=lexDiabetesMaster-a& annotationsH LexicalVariation lexDiabetesMaster-a |condition=()& string="prototype"& value=External.external-diabetes& annotations=|
// The Words Word brochure
|value="brochure"|
Word information
|value="information"|
// The Annotations Annotation html-doc-webbedoc-toplevel title="The HealthDoc Home Page"|
Annotation html-doc-webbedoc-doc-a
|background="blue2. gif '& bgcolor="#ffffef'& vlink="#990099"& alink="#990099"& link="#990099"& title-begin="<center><font size = +4 color = \"990099\">"& title-end="</font></center>"& image-dir="/~healthdo/images/"& image-align="alt"& parent=html-doc-webbedoc-toplevel| Annotation html-sec-webbedoc-secl
|title-begin="<font size = +1 color = \"990099\">"& title-end="</font>"& title="The goal ofthe HealthDoc project"& parent=html-doc-webbedoc-toplevel|
Annotation html-sec-webbedoc-sec 1 a I image="businmen.j pg"& parent=html-sec-webbedoc-secl | Annotation html-sec-webbedoc-sec2
|title-begin="<font size - +1 color = \"990099\">"& title-end="</font>"& title="The motivation for the research"& parent=html-doc-webbedoc-toplevel|
// The External objects External external-diabetes
|fileName="diabetes.master"& profile=<list of parameters describing current user/audience>| Example 2: Customizable Health Information
// The parameters PurposeParameters
I type=insulin-dependent non-insulin-dependent& technical=high-technical moderate-technical low-technical& age=senior adult young-adult child& locus-of-control^doctor patient]
RepresentationLevelParameters |english spl|
// The top-level object toplevel=Document.diabetes
// The Documents and DocumentVariations Document diabetes |title="Treating Your Diabetes"& variations=doc-a& annotations=html-doc-diabetes-toplevel discourse-doc-diabetes-toplevel
DocumentVariation doc-a |condition=()& componentList=Section.secl Section.sec2 Section.sec3 Section.sec4& annotations=html-doc-diabetes-doc-a discourse-doc-diabetes-toplevel|
// The Sections and SectionVariations Section seel |variations=secla& annotations=html-sec-diabetes-secl discourse-sec-diabetes-sec 11
SectionVariation seel a |condition=()& componentList=Section.subsecl-l Section.subsecl-2& annotations^html-sec-diabetes-sec 1 a discourse-sec-diabetes-sec 11
Section subsecl-1 I variations=subsec 1 - 1 a& annotations=html-sec-diabetes-subsec 1 - 1 discourse-sec-diabetes-sec 11
SectionVariation subsecl-1 a |condition-()& componentList-Topic.topic 1 & annotations=html-sec-diabetes-subsec 1 - 1 a discourse-sec-diabetes-sec 11
Section subsec 1-2 |variations=subsecl -2a& annotations=html-sec-diabetes-subsecl -2 discourse-sec-diabetes-sec 11 SectionVariation subsecl-2a |condition=()& componentList=Topic.topic2 Topic.topic3 Topic.topic4 Topic.topic5 Topic.topic6& annotations=html-sec-diabetes-subsec 1 -2a discourse-sec-diabetes-sec 11
// The Topics and TopicVariations Topic topic 1 I variations=topic 1 a& annotations=html-topic-diabetes-default discourse-topic-diabetes-topic 11
TopicVariation topic la |condition=()& componentList=Sentence.sentl a- 1 Sentence. sentl a-2& annotations=html-topic-diabetes-default discourse-topic-diabetes-topic 11
Topic topic3 |variations=topic3a topic3b& annotations=html-topic-diabetes-default discourse-topic-diabetes-topic3|
TopicVariation topic3a I condition=(type insulin-dependent)& componentList=Sentence.sent3a-l& annotations=html-topic-diabetes-default discourse-topic-diabetes-topic3|
TopicVariation topic3b I condition=(type non-insulin-dependent)& componentList=Sentence.sent3b-l& annotations=html-topic-diabetes-default discourse-topic-diabetes-topic3
// The Sentences and SentenceVariations Sentence sentl a- 1
I variations=sent 1 a- 1 i& annotations=html-sent-diabetes-default discourse-sent-diabetes-default|
SentenceVariation sentla-li |condition=()& componentList=()& levelList=SentenceRepLevel.sentla-l i-english SentenceRepLevel.sentla-li-spl& annotations=html-sent-webbedoc-default|
SentenceRepLevel sentl a- 1 i-english |repLevel=english& componentList^Lexical.lexDiab Lexical.lexis Lexical. lexa Lexical.lexgroup Lexical. lexof Lexical. lexconds Lexical.lexin Lexical.lexwhich Lexical. lexglucose Lexical. lexlevels Lexical.lexare Lexical. lexabnormally Lexical.lexhigh.& annotations=html-sent-diabetes-default discourse-sent-diabetes-default| SentenceRepLevel sentla-li-spl |repLevel=spl& componentList="<a Sentence Plan Language (SPL) form>"& annotations=html-sent-diabetes-default discourse-sent-diabetes-default|
Sentence sent3a-l |variations=sent3a-li& annotations=html-sent-diabetes-default discourse-sent-diabetes-default| SentenceVariation sent3 a- li |condition=()& componentList=()& levelList=SentenceRepLevel. sent3a-l i-english SentenceRepLevel. sent3a-li-spl& annotations=html-sent-webbedoc-default|
SentenceRepLevel sent3a- 1 i-english |repLevel=english& componentList="The condition that you have is insulin-dependent diabetes. "& annotations=html-sent-diabetes-default discourse-sent-diabetes-default|
SentenceRepLevel sent3a-li-spl |repLevel=spl& componentList="(asc / ascription :tense present :domain (condl / abstraction
:lex condition : determiner the :process (have / ownership :lex have-possession :tense present
: domain (hearer / person) : range cond)) :range (diab2 / abstraction :lex diabetes : determiner zero
:property-ascription (ins / quality
:lex insulin-dependent)))"& annotations=html-sent-diabetes-default discourse-sent-diabetes-default| // The Lexicals and LexicalVariations Lexical lexDiab
|variations=lexDiab-a& annotations=|
LexicalVariation lexDiab-a |condition=()& string="Diabetes"& value=Word.Diab& annotations=| Lexical lexglucose |variations=lexglucose-a& annotations=| LexicalVariation lexglucose-a |condition=()| string="glucose"& value=Word.glucose& annotations^!
Lexical lexhigh |variations=lexhigh-a& annotations=| LexicalVariation lexhigh-a |condition=()| string="high"& value=Word.high& annotations=|
// The Words Word Diab |value=="Diabetes"&| Word glucose
|value=="glucose"|
Word high |value=="high"&|
// The Annotations // // HTML Annotations
Annotation html-doc-diabetes-toplevel |title=" About Your Diabetes"|
Annotation html-doc-diabetes-doc-a |bgcolor="#ffffef'& title-begin="<hl align=\"center\">"&title-end="</hl>"& parent=html-doc-diabetes-toplevel|
Annotation html-sec-diabetes-default // <default HTML markup for any Section in diabetes document>|
Annotation html-sec-diabetes-sec 1 |title="Basic information"& parent-html-sec-diabetes-default| Annotation html-sec-diabetes-sec 1 a |title-begin="<h2 align=\"center\">"&title-end="</h2>"& parent=html-sec-diabetes-sec 11
Annotation html-sec-diabetes-subsec 1 - 1 |title="What is diabetes?"& parent=html-sec-diabetes-sec 11
Annotation html-sec-diabetes-subsec 1 - 1 a |section-end="<p>"& title-begin-"<h3>"&title-end="</h3>"& parent=html-sec-diabetes-subsec 1 - 11
Annotation html-sec-diabetes-subsec 1 -2 |title="The two types of diabetes"& parent=html-sec-diabetes-secl I
Annotation html-sec-diabetes-subsec 1 -2a ! section-end="<p>"& title-begin="<h3>"&title-end="</h3>"& parent=html-sec-diabetes-subsec 1 -2|
// // Linguistic Annotations Annotation discourse-doc-diabetes-toplevel |title=" About Your Diabetes"|
Annotation discourse-sec-diabetes-sec 1 |title="Basic information"& relations="((ord < (topic2a topic3a)) (ord < (topic2a topic3b))
(ord < (topic2a topic4a))
(ord < (topic2a topic5a))
(ord < (topic2a topic5b))
(ord < (topic2a topic5c)) (ord < (topic2a topic5d))
(ord < (topic2a topicόa))
(ord < (topic2a topicόb))
(ord < (topic3a topicόa))
(ord < (topic3b topicόb)) (elaboration topic4a topic3b)
(justification topic2a topic3a)
(justification topic2a topic3b)
(justification topic2a topic4a)
(elaboration topic4a topic2a) (elaboration topic5a topic3a)
(elaboration topic5b topic3b)
(elaboration topic5c topic3a)
(elaboration topic5d topic3b)
(justification topic3a topicόa) (justification topic3b topicόb)"| Annotation discourse-sec-diabetes-subsec 1 - 1 |title="What is diabetes?"& parent=discourse-sec-diabetes-sec 11
Annotation discourse-sec-diabetes-subsec 1 -2 |title="The two types of diabetes"& parent=discourse-sec-diabetes-sec 11 Annotation discourse-topic-diabetes-topic3 |corefs="((condl specific cond) (diab2 specific diab) (condl generic diab2))"& parent=discourse-sec-diabetes-sec 11
The Pseudocode
The program objects are:
BasicObject. ResolvableObject.
DocumentObj ect.
DocumentObj ectSet.
VariationContainer.
Variation. Condition.
WorkingCondition.
Annotation.
External.
The Procedures
Procedures used by BasicObject classes
These procedures are used within BasicObject classes (e.g., BasicDocument, BasicSection, etc.)
Procedure: SetProperties
Parameter: A list of property names and their corresponding values
Algorithm: for each property in the list do set its value
Procedures used by VariationContainer classes
These procedures are used within VariationContainer classes (e.g., Document, Section, Topic), which have a list of variations. VariationContainer classes are extensions of ResolvableObject and therefore have a Resolve method. Procedure: SetProperties
Parameter:
A list of property names and their corresponding values Algorithm: for each property in the list do if property is a list of variations then set the class property "variations" to given value else if property is a list of annotations then set the class property "annotations" to given value else if property is defined for this class then set its value else signal an error end if end for
Procedure: Resolve (for VariationContainer classes) Parameters
- (input)
An instance of WorkingCondition (a list of purpose parameters and their current values)
- (input)
An instance of List
(a list ofthe desired representation levels to output)
- (output) An instance of DocumentObjectSet, i.e., a set of references to resolved DocumentObjects, with one set member for each desired level of representation of this VariationContainer instance Algorithm: if Resolve has already been called with this WorkingCondition then result is the previously saved result else call Satisfies on each variation of this Variation Container until the WorkingCondition is satisfied call Resolve on this satisfying variation to return a DocumentObjectSet add the appropriate annotations for this VariationContainer into the returned DocumentObjectSet end if if no variation satisfies the WorkingCondition then result is null else result is the annotated DocumentObjectSet for this VariationContainer end if
Procedures used by Variation classes
These procedures are used within Variation classes (e.g., DocumentVariation, SectionVariation), which therefore have a selection condition. Procedure: SetProperties
Parameters: A list of property names and their corresponding values Algorithm: for each property in the list do if property is a condition then set the class property "condition" to given value else if property is a list of annotations then set the class property "annotations" to given value else if property is a list of components then set the class property "componentList" to given value
else if property is defined for this class then set its value else signal an error end if end for Procedure: Satisfies
Parameters:
- (input)
An instance of WorkingCondition - (output)
A Boolean value Algorithm: if this variation satisfies the WorkingCondition then result is true else result is false end if
Procedure: Resolve (for Variation classes) Parameters:
- (input)
An instance of WorkingCondition
- (input)
An instance of List (a list ofthe desired representation levels to output)
- (output)
An instance of DocumentObjectSet, i.e., a set of references to resolved DocumentObjects, with one set member for each desired level of representation of this Variation instance
Algorithm: create a new DocumentObjectSet for this Variation for each component of this Variation do call Resolve on the current component to return a DocumentObjectSet for this component add the appropriate annotations for this Variation into the returned DocumentObjectSet attach the annotated DocumentObjectSet as a child ofthe DocumentObjectSet for this Variation end for result is the annotated DocumentObjectSet for this Variation
Toplevel Procedures These are the toplevel procedures used to read in a datafile containing a master document, create instances ofthe document-class objects, then loop to read in new values ofthe purpose parameters and generate an appropriately customized version ofthe document, with all appropriate annotations, at each level of representation as specified by the representation-level parameters given in the datafile.
Procedure: Parse
Parameter: Name of datafile to be read in
Algorithm: while end-of-file has not been reached read in the next line ofthe datafile
if the line is a comment or a blank line then skip over it else if the line specifies the toplevel DocumentObj ect then set the "toplevel" variable to reference this object else if the line specifies the purpose parameters and their values then set the "purposeParameters" variable else if the line specifies the possible representation levels then set the "repLevels" variable else instantiate the specified document object
(i.e., create a new executable instance of this document-object class, set a reference to this instance, assign a name to this instance, assign its properties) end of while loop return the reference to the toplevel DocumentObj ect, the list of purpose parameters and their possible values, the list of desired representation levels
Procedure: Reader
Algorithm: call an application-specific interface to:
- read in current values of purpose parameters
- read in desired representation levels - create an instance of WorkingCondition using these values
Procedure: Main
Parameter: Name ofthe input datafile Algorithm: call Parse call Reader while new purpose parameters are input call Resolve on the toplevel object for each specified level of representation output the corresponding DocumentObjectSet end for call Reader end of while loop
The present application has been described with reference to a presently preferred embodiment. Modifications and variations of that embodiment will be apparent to a person of skill in the art. Such modifications and variants are believed to be within the scope ofthe present invention as defined in the claims appended hereto.

Claims

THE EMBODIMENTS OF THE INVENTION IN WHICH AN EXCLUSIVE PROPERTY OR PRIVILEGE IS CLAIMED ARE DEFINED AS FOLLOWS:
1. A computer system for generating custom versions of a master document in accordance with user purpose parameters, said system comprising: (a) a datafile including a data structure specifying said master document;
(b) a document class library for providing general document class definitions;
(c) a parser for creating executable instances of said document-class data structures from said datafile;
(d) a user input interface for inputting said purpose parameters; and (e) a selection engine for generating said custom versions of said document from said instances of said document class library by utilizing current values of said purpose parameters.
2. A computer system as defined in claim 1, said document including a plurality of elements and said data structure defining relationships between said elements and variations thereof.
3. A computer system as defined in claim 1, said instances including a plurality of elements and variations each arranged in a hierarchy of predetermined relationships.
4. A computer system as defined in claim 3, each element including a resolver for selecting from its variations and each variation including a resolver for expanding said variation into its component document objects.
5. A computer system as defined in claim 3, said elements being the components of said document structure.
6. A computer system as defined in claim 1, said instances each including linguistic information for providing automated grammatical and stylistic correction of said customized documents.
7. A computer readable memory device encoded with a data structure for generating custom versions of a master document, the data structure having a plurality of general class objects for describing elements and variations providing, structure and content of said master document.
8. A device as defined in claim 6, said data structure including: (a) a first data block for identifying user purpose parameters for selecting versions of said master document;
(b) a toplevel object for identifying a document-class instance; and
(c) a program structure for specifying the form and content of said custom versions of said master document.
9. A device as defined in claim 7, said data structure including an external object for describing instances of an external class for defining links to other master documents to thereby to create a network of systems for generating custom documents.
10. A device as defined in claim 8, said links being hypertext links.
11. A method for generating custom versions of a master document implemented in a computer system and in accordance with user purpose parameters, said method comprising the steps of:
(a) specifying said master document having a datafile and including a data structure;
(b) providing a document class library having general document class definitions;
(c) parsing said datafile for creating executable instances of said document-class data structures;
(d) a user input interface for inputting said purpose parameters; and using current values of said purpose parameters to generate custom versions of said document from said instances of said document class library.
12. A method as defined in claim 11, including executing said instance for selecting from variations of an element and expanding each variation into its component document objects, wherein each said instances includes a plurality of elements and variations each arranged in a hierarchy of predetermined relationships.
PCT/CA1998/000771 1997-08-11 1998-08-11 A method and apparatus for authoring of customizable multimedia documents WO1999008205A1 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
AU87959/98A AU8795998A (en) 1997-08-11 1998-08-11 A method and apparatus for authoring of customizable multimedia document
EP98939453A EP1002284A1 (en) 1997-08-11 1998-08-11 A method and apparatus for authoring of customizable multimedia documents
US09/502,233 US6938203B1 (en) 1997-08-11 2000-02-11 Method and apparatus for authoring of customizable multimedia documents

Applications Claiming Priority (6)

Application Number Priority Date Filing Date Title
GB9716986.6 1997-08-11
GBGB9716986.6A GB9716986D0 (en) 1997-08-11 1997-08-11 A method and apparatus for authoring of castomizable multi,edia documents
GBGB9720133.9A GB9720133D0 (en) 1997-09-22 1997-09-22 A method and apparatus for authoring of customizable multimedia documents
GB9720133.9 1997-09-22
CA2,230,367 1998-02-24
CA002230367A CA2230367C (en) 1997-08-11 1998-02-24 A method and apparatus for authoring of customizable multimedia documents

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US09/502,233 Continuation US6938203B1 (en) 1997-08-11 2000-02-11 Method and apparatus for authoring of customizable multimedia documents

Publications (1)

Publication Number Publication Date
WO1999008205A1 true WO1999008205A1 (en) 1999-02-18

Family

ID=27170640

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CA1998/000771 WO1999008205A1 (en) 1997-08-11 1998-08-11 A method and apparatus for authoring of customizable multimedia documents

Country Status (3)

Country Link
EP (1) EP1002284A1 (en)
AU (1) AU8795998A (en)
WO (1) WO1999008205A1 (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001039020A2 (en) * 1999-11-24 2001-05-31 Sun Microsystems, Inc. System, method and computer program product for publishing web page content having uniform predetermined format and features
EP1170672A1 (en) * 2000-07-04 2002-01-09 OKS GmbH Automatic generation of publishing documents on the internet
WO2002025483A1 (en) * 2000-09-22 2002-03-28 Vizion Factory E-Learning A/S Electronic document system
US6981214B1 (en) * 1999-06-07 2005-12-27 Hewlett-Packard Development Company, L.P. Virtual editor and related methods for dynamically generating personalized publications
US7028303B2 (en) 1999-09-17 2006-04-11 International Business Machines Corporation Method, system, and program for processing a job in an event driven workflow environment
US7171373B2 (en) 1999-10-21 2007-01-30 International Business Machines Corporation Database driven workflow management system for generating output material based on customer input
US7246315B1 (en) 2000-05-10 2007-07-17 Realtime Drama, Inc. Interactive personal narrative agent system and method
US7302430B1 (en) 1999-08-31 2007-11-27 International Business Machines Corporation Workflow management system for generating output material based on customer input
US7506247B2 (en) 1999-10-15 2009-03-17 International Business Machines Corporation Method for capturing document style by example

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
C. DIMARCO ET AL: "The automated generation of Web documents that are tailored for the reader", PROCEEDINGS OF THE 1997 AAAI SPRING SYMPOSIUM ON NATURAL LANGUAGE PROCESSING FOR THE WORLD WIDE WEB, March 1997 (1997-03-01), Stanford University, Palo Alto, CA, US, pages 44 - 53, XP002086363 *
E. H. HOVY: "A New Level of Language Generation Technology: Capabilities and Possibilities", IEEE EXPERT, vol. 7, no. 2, April 1992 (1992-04-01), Los Alamitos, CA, US, pages 12 - 17, XP000331536 *
G. HIRST ET AL: "Authoring and Generating Health-Education Documents That Are Tailored to the Needs of the Individual Patient", USER MODELING: PROCEEDINGS OF THE SIXTH INTERNATIONAL CONFERENCE UM97, June 1997 (1997-06-01), Sardinia, IT, pages 107 - 118, XP002086454 *

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6981214B1 (en) * 1999-06-07 2005-12-27 Hewlett-Packard Development Company, L.P. Virtual editor and related methods for dynamically generating personalized publications
US8452809B2 (en) 1999-08-31 2013-05-28 International Business Machines Corporation Workflow management system for generating output material based on customer input
US7302430B1 (en) 1999-08-31 2007-11-27 International Business Machines Corporation Workflow management system for generating output material based on customer input
US7028303B2 (en) 1999-09-17 2006-04-11 International Business Machines Corporation Method, system, and program for processing a job in an event driven workflow environment
US7506247B2 (en) 1999-10-15 2009-03-17 International Business Machines Corporation Method for capturing document style by example
US7716575B2 (en) 1999-10-15 2010-05-11 International Business Machines Corporation System for capturing document style by example
US7761788B1 (en) 1999-10-15 2010-07-20 International Business Machines Corporation System and method for capturing document style by example
US7171373B2 (en) 1999-10-21 2007-01-30 International Business Machines Corporation Database driven workflow management system for generating output material based on customer input
WO2001039020A3 (en) * 1999-11-24 2002-10-03 Sun Microsystems Inc System, method and computer program product for publishing web page content having uniform predetermined format and features
WO2001039020A2 (en) * 1999-11-24 2001-05-31 Sun Microsystems, Inc. System, method and computer program product for publishing web page content having uniform predetermined format and features
US7246315B1 (en) 2000-05-10 2007-07-17 Realtime Drama, Inc. Interactive personal narrative agent system and method
WO2002003238A1 (en) * 2000-07-04 2002-01-10 Oks Gmbh Automatic creation of publishing documents over the internet
EP1170672A1 (en) * 2000-07-04 2002-01-09 OKS GmbH Automatic generation of publishing documents on the internet
WO2002025483A1 (en) * 2000-09-22 2002-03-28 Vizion Factory E-Learning A/S Electronic document system

Also Published As

Publication number Publication date
AU8795998A (en) 1999-03-01
EP1002284A1 (en) 2000-05-24

Similar Documents

Publication Publication Date Title
US6938203B1 (en) Method and apparatus for authoring of customizable multimedia documents
Wilcock Introduction to linguistic annotation and text analytics
Androutsopoulos et al. Generating natural language descriptions from OWL ontologies: the NaturalOWL system
Ko et al. Barista: An implementation framework for enabling new tools, interaction techniques and views in code editors
Forbes-Riley et al. Computing discourse semantics: The predicate-argument semantics of discourse connectives in D-LTAG
Hornstein The minimalist program after 25 years
Witt et al. On the lossless transformation of single-file, multi-layer annotations into multi-rooted trees
WO1999008205A1 (en) A method and apparatus for authoring of customizable multimedia documents
Androutsopoulos et al. Source authoring for multilingual generation of personalised object descriptions
Byamugisha Ontology verbalization in agglutinating Bantu languages: a study of Runyankore and its generalizability
Varges et al. SemScribe: Natural Language Generation for Medical Reports.
CA2230367C (en) A method and apparatus for authoring of customizable multimedia documents
Ranta Gf: A multilingual grammar formalism
DiMarco et al. Generation by selection and repair as a method for adapting text for the individual reader
Dimitriadis et al. How to integrate databases without starting a typology war: The Typological Database System
Branting et al. Integrating discourse and domain knowledge for document drafting
Steel et al. Generating human-usable textual notations for information models
Bouayad-Agha et al. Natural language generation and semantic web technologies
Theune et al. Generating varied narrative probability exercises
EP0358702A4 (en) Tailored text generation.
Becker Natural language generation with fully specified templates
Lapalme et al. XML based multilingual authoring
Parsons An authoring tool for customizable documents
Alencar A computational implementation of periphrastic verb constructions in French
Galanis et al. Generating multilingual personalized descriptions from owl ontologies on the semantic web: the naturalowl system

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AL AM AT AU AZ BA BB BG BR BY CA CH CN CU CZ DE DK EE ES FI GB GE GH GM HR HU ID IL IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT UA UG US UZ VN YU ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW SD SZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
NENP Non-entry into the national phase

Ref country code: KR

WWE Wipo information: entry into national phase

Ref document number: 09502233

Country of ref document: US

WWE Wipo information: entry into national phase

Ref document number: 1998939453

Country of ref document: EP

NENP Non-entry into the national phase

Ref country code: CA

WWP Wipo information: published in national office

Ref document number: 1998939453

Country of ref document: EP

REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

WWW Wipo information: withdrawn in national office

Ref document number: 1998939453

Country of ref document: EP