WO2002103555A2 - Computer system with natural language to machine language translator - Google Patents

Computer system with natural language to machine language translator Download PDF

Info

Publication number
WO2002103555A2
WO2002103555A2 PCT/GB2002/002742 GB0202742W WO02103555A2 WO 2002103555 A2 WO2002103555 A2 WO 2002103555A2 GB 0202742 W GB0202742 W GB 0202742W WO 02103555 A2 WO02103555 A2 WO 02103555A2
Authority
WO
WIPO (PCT)
Prior art keywords
syntactic
terms
formal
semantic
expressions
Prior art date
Application number
PCT/GB2002/002742
Other languages
French (fr)
Other versions
WO2002103555A3 (en
Inventor
Keith S. Manson
Original Assignee
Pipedream Metasystems, Inc.
Howe, Steven
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Pipedream Metasystems, Inc., Howe, Steven filed Critical Pipedream Metasystems, Inc.
Priority to EP02732949A priority Critical patent/EP1397753A2/en
Priority to AU2002304439A priority patent/AU2002304439A1/en
Publication of WO2002103555A2 publication Critical patent/WO2002103555A2/en
Publication of WO2002103555A3 publication Critical patent/WO2002103555A3/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/55Rule-based translation

Definitions

  • the present invention is directed to a system which translates natural (human) language into an abstract formal language.
  • This formal language is explicitly designed to serve as a universal template for further translations into a comprehensive variety of machine languages which are executable in specific operational environments.
  • Extensive efforts have been made, many articles have been published, and many patents have been issued, all directed toward the goal of providing computers with the capacity to understand natural (human) language sufficiently well to respond reliably and accurately to directives issued from human users.
  • Many companies and research groups, such as AT&T, IBM, and Microsoft, and an assortment of academic institutions, are presently working on natural language processing (NLP).
  • U.S. Patent No. 5,895,466, wherein a database stores a plurality of answers which are indexed to natural language keys.
  • the natural language device receives a natural language question over the network from a remote device and the question is analyzed using a natural language understanding system. Based on this analysis, the database is then queried and an answer is provided to the remote device.
  • a system and method for converting or translating expressions in a natural language such as English into machine executable expressions in a formal language This translation enables a transformation from the syntactic structures of a natural language into effective algebraic forms for further exact processing.
  • the invention utilizes algorithms employing a reduction of sequences of terms defined over an extensible lexicon into formal syntactic and semantic structures. This term reduction incorporates both syntactic type and semantic context to achieve an effective formal representation and interpretation of the meaning conveyed by any natural language expression.
  • FIG. 1 shows the hardware architecture of a computer system comprising the natural language converter of the present invention
  • FIG. 2 shows the general process and data flow of the inventive system
  • FIG. 3 shows a more detailed flow diagram for the inventive system
  • FIG. 4a shows the results of virtual type assignment applied to a sample text
  • FIG. 4b shows the results of actual type assignment for the same text
  • FIG. 5 shows the term reduction sequence for a sample text
  • FIG. 6a shows the sequence of dependency chains for a sample text
  • FIG. 6b shows the associated syntactic tree for the same text
  • FIG. 7a shows the schema of structures and maps involved in the external interpretation of a text
  • FIG. 7b shows this external interpretation schema as controlled by a metasemantic protocol.
  • the inventive system provides a method for translating expressions in a natural language such as English into machine executable expressions.
  • the user inputs text in a natural language through some input device to a known computer system which may comprise a standalone computer system, a local network of computing devices, or a global network such as the Internet, using wired land lines, wireless communication, or some combination thereof, etc.
  • This computer system includes memory for storing data, and a data processor.
  • the text may be entered into the client device or local VDM (Video Display Monitor) (1.1) by any suitable means, such as direct input via keyboard (1.1.1), voice input via speech recognition means (an SR system) (1.1.2), or indirect input via optical scanning (an OCR system) (1.1.3).
  • the natural language text input to the system is passed along the network or local bus (1.3) to a server or local CPU (Central Processing Unit) (1.2) where it is processed in accordance with the inventive method and system.
  • This processed output of the system is then provided to the system for distribution to the original input device (1.1), or to other collateral devices (1.4) which may be one or more digital computers, mobile devices, etc.
  • the inventive system thus comprises a natural language interface to any sufficiently capable digital environment. Refer now to FIG.
  • Natural language text input is entered by the user (2.0) into the internal system (2.1) by means of a text processing module (2.1.1) which parses the text.
  • the output of the text processing module comprises a parsed sequence of preexpressions which is entered into the syntactic processing module (2.1.2) which provides syntactic type information, establishes proper syntactic dependencies between terms in expressions, and represents these expressions as complexes in a syntactic algebra.
  • the output of the syntactic processing module comprising a sequence of these syntactic complexes, is entered into the semantic processing module (2.1.3) in order to achieve a semantic interpretation of the input text.
  • the output of the semantic processing module comprising a formal interpretation of the input text, is entered into an external system by means of the external processing module (2.2.1), which finally provides a sequence of executable expressions derived from the input text for use in a specific operational environment (2.2.2).
  • the means for providing text input to the system are well known in the art and are commercially available.
  • Another standard component of the present system is a text parser, construed here in an extremely narrow sense as a limited process restricted to partitioning text strings into syntactic subcomponents such as paragraphs, sentences, and words.
  • the text parser discussed herein does not provide further linguistic information such as grammatical types, syntactic dependencies, semantic import, etc.
  • Such limited text parsers are standard components of any natural language processing system, and exceedingly well known in the art.
  • Yet another component in the present system which plays a relatively standard role is the lexicon or "electronic dictionary”.
  • lexicons are also well known in the art and are discussed in many patents including U.S. Patent Nos. 5,371,807; 5,724,594; 5,794,050; and 5,966,686.
  • the notion and function of "virtual" types which play a significant syntactic categorization role in the passive specification of lexical terms, and hence strongly contribute to the definition of the particular lexicon used in the inventive system, are not standard, and so require careful description.
  • text input devices, text parsers, and their operation are so well known, they will not be further described in detail herein.
  • FIG. 3 shows more details of the inventive system.
  • the components, modules, and submodules of the inventive system are enumerated for convenient reference so that the operation and application of the system and method may be described in detail.
  • natural language text is entered by the user (3.0) into the text input submodule (3.1.1) of the text processing module (3.1) via any suitable means including a keyboard or a speech recognition system.
  • the user input signal is simply some linguistic data stream which is digitized into a string of ASCII characters. This ASCII string is the input text.
  • any natural language text is typically organized into a sequence of paragraphs, each of which is a sequence of sentences, each of which is a sequence of words, each of which is a sequence of characters (alphanumeric symbols). All of this nested syntactic structure must be taken into account if an effective interpretive analysis is to be achieved.
  • the role of the text parser is to determine and then present these nested sequential structures to the system for further processing.
  • the adequate output of the text parser is a sequence of sequences of sequences of sequences of ASCII characters.
  • any text is considered to comprise a sequence of sentences, or more properly, of expressions, each of which comprises a sequence of words.
  • any such word in a text is simply treated as a partitioned substring of the input text string.
  • the proper output of the text parser is considered here to be a sequence of sequences of "pretokens", where a pretoken is a text fragment which is a candidate for a word, i.e. an ASCII (sub)string presented for recognition as a system "token”.
  • the system lexicon is a lexicographically ordered list of such tokens (with associated type and reference data), and recognition by the system of a pretoken as an actual token is simply a matter of exact string comparison.
  • the output of the text parser (3.1.2) is a sequence of sequences of pretokens, or sequence of "preexpressions", which is then passed to the type assignment submodule (3.2.1.1) of the type association submodule (3.2.1), where syntactic processing is initiated.
  • Each pretoken is checked against the system lexicon (3.2.0) for its status as a recognized lexical token. If a pretoken is recognized, i.e. if the string comprising that pretoken is included in the lexicon as an actual token (with associated syntactic and semantic data), then it is assigned a lexically associated syntactic type.
  • the system determines at decision node (3.2.1.2) whether all the pretokens from the entered text have been recognized as system tokens. If the determination is negative, then as indicated by the "no" connection to the lexical insertion submodule (3.2.1.3), the user is given the option to add the unrecognized pretokens to the system as tokens with associated type and reference data, i.e. to insert new terms into the lexicon, for further processing. On the other hand, if the determination is affirmative, then the resulting sequence of sequences of lexically typed tokens, or sequence of "virtual" expressions, is passed along the "yes" connection to the type contextualization submodule (3.2.1.4).
  • This submodule initiates a second order type assignment which uses the initial (or virtual) lexical type assignments as data for a contextual process which may reassign these initial types depending on the relative syntactic roles of the tokens in the virtual expressions being processed.
  • each virtual expression is promoted to an "actual" expression, and each token/type pair becomes a fully functional lexical term with associated semantic data.
  • the output of the type association submodule (3.2.1) of the syntactic processing module (3.2) comprises a sequence of (actual) expressions, and is passed to the term correlation submodule (3.2.2.1) of the term resolution module (3.2.2).
  • the output of this submodule is a sequence of sequences of fully correlated lexical terms, which is then entered into the term reduction submodule (3.2.2.2), wherein proper syntactic dependencies between terms in an expression are established by means of a type reduction matrix.
  • the output of this submodule is a sequence of sequences of reduction links, which is entered into the term inversion submodule (3.2.2.3), wherein these reduction links are used to construct syntactic trees, each tree representing a processed expression.
  • the resulting sequence of syntactic trees is passed to the syntactic representation submodule (3.2.3), wherein each expression is then represented as a syntactic complex, i.e. a (usually composite) term in the syntactic algebra.
  • Semantic processing (3.3) is initiated in the semantic representation submodule (3.3.1), wherein the input sequence of syntactic complexes from the syntactic processing module (3.2) is represented as a full semantic complex, i.e. a structure of internal objects in the semantic algebra.
  • This semantic complex is then passed to the formal representation submodule (3.3.2), wherein the input semantic complex is represented as a formal structure adhering to a fundamental transaction paradigm.
  • This formal semantic model is then combined with the sequence of syntactic complexes output from the syntactic processing module to form the input to the formal interpretation submodule (3.3.3), wherein a sequence of formal expressions is constructed as an interpretation of the presented syntactic and semantic data.
  • the output of the formal representation submodule is passed to the external representation submodule (3.4.1) of the external processing module (3.4), wherein a specific external representation appropriate for the formal semantic data presented is identified.
  • This external representation is combined with the sequence of formal expressions output from the formal interpretation submodule to form the input to the external interpretation submodule (3.4.2), wherein a sequence of executable expressions is constructed accordingly for ultimate processing in the appropriate operational environment (3.5).
  • METASCRIPT is a translation from a natural language into an executable formal language. This translation is essentially a transformation from the syntactic structures of natural language into effective algebraic forms suitable for further processing.
  • the formal semantics which finally determines the ensuing interpretations and executions of these formal expressions in external operational environments is object-oriented.
  • the fundamental algorithm upon which METASCRIPT is based employs a reduction to formal syntactic structures over terms defined in an extensible lexicon.
  • This term reduction incorporates both syntactic type and semantic context to achieve an effective formal representation and interpretation of the meaning conveyed by any natural language expression.
  • Extensibility of the lexicon under specific user direction provides the capacity for the system to expand its knowledge of vocabulary and usage, and consequently, offers an effective mechanism under user control for establishing definite incremental enhancements to the system's linguistic capabilities, hence substantially increasing the system's familiarity with (and competence in) particular operational environments. Put simply, the system learns as it goes.
  • any desired level of syntactic disambiguation is attainable by increasing the local dimensionality of the underlying reduction matrix, though this feature is part of the underlying algorithm, and therefore independent of user modulation.
  • METASCRIPT is not a speech recognition system. Instead, it is a fully capable natural language interpreter. Specifically, METASCRIPT translates natural language expressions into expressions in a formal language associated with an abstract network protocol. A more detailed account of this process follows.
  • a set is a collection of things, called elements.
  • N ⁇ 0,1,2,3,... ⁇ is the set of natural numbers.
  • a set A is most conveniently determined by some property P of its elements, indicated by use of so-called “set-builder notation” as A - ⁇ x
  • P(x) ⁇ the set of things x satisfying property P.
  • the expression 'x A' indicates that the thing x is an element of the set A
  • the expression 'A ⁇ B' indicates that the set A is a subset of the set 5 , i.e. that every element of A is an element of B as well d) a map (or function) is a relation between two sets A,B such that each element in A is assigned a unique element in B.
  • the expression '/: A ⁇ B' indicates that/is a map from the set A to the set B, i.e./assigns a unique element ⁇ -fix)eB ⁇ o each element xe ⁇ l.
  • a program is a function which maps input onto output in an effective manner, i.e. by means of a finite, discrete, deterministic procedure; in fact, any process or procedure is effective precisely to the extent that it is executable as a program of this sort.
  • the union AVJB is the set consisting of all elements x such that xeA or x ⁇ B, i.e.
  • union U C is the set consisting of unions over all sets g) for any algebras A,B and representation/:
  • a - B the correlated tensor product
  • A®jB is the distinguished subset of AxB which consists of all pairs (xf(x)) for xeA, i.e.
  • A®jB ⁇ (xf(x))
  • .xe ⁇ the graph off; for an implicit representation, the map subscript may be omitted, i.e.
  • object language a formal language which is interpretable relative to a class of extensional structures, i.e. a formal language with an object-oriented semantics
  • protocol language a formal language which mediates transactions between addressable nodes on a network
  • executable language a formal, programmable language which encodes instructions directly implementable by a suitably capable machine such as a computer system: the integrated process which manifests METASCRIPT, and which may be implemented as software running on any programmable device string: a sequence of ASCII characters text: a string presented to the system as the fundamental unit of initial input parser: a process which partitions texts into sequences of sequences of substrings preexpression: a sequence of substrings of some text, distinguished as a unit of syntactic processing by the text parser lexicon: a system specific, indexed, lexicographically ordered list of designated strings, called tokens, each of which is associated with certain syntactic and semantic information; in particular, each token is associated with a lexical type, which may be virtual (syntactically ambiguous) or actual syntactically unambiguous); furthermore, each token which is associated with an actual type is also associated with a lexical reference, which provides basic semantic information
  • a single string may serve as a token with multiple entries, associated with a number (including 1) of virtual types and a number of actual types, reflecting that token 's multiple syntactic roles, e.g. as a verb and an object, or an object and an adjective, etc.
  • token a string recognized by the system in the sense that it is included in the system lexicon type: a syntactic category used to organize semantically similar tokens; there are three sorts:
  • syntactic an element of the syntactic algebra associated with a language
  • semantic an element of the semantic object algebra associated with a language
  • a dynamic structure E represented as a series of static states E ⁇ (for keN), each of which comprises a model for an executable language EL
  • Sym(L) the set of basic symbols for L, usually including an alphabet and various punctuation symbols
  • Tok(L) the set of lexical tokens for L, i.e. a distinguished subset of Seq(S>ym ))
  • Ltp(L) the set of lexical types for L
  • Rtp(L) the set of reduced types for L, including a null type
  • Trm(L) the set of lexical terms for L, i.e. a distinguished subset of Tok(L)xLtp(L)
  • Rdn(L) the set of reduced terms for L, i.e. a distinguished subset of Tok(L)xRtp(L)
  • Lex(L) the lexicon for L, i.e. a distinguished subset of NxTrm(L)xRef(L) consisting of a lexicographically ordered list of (indexltokenl type/reference) entries
  • Txt(L) the set of texts for L, i.e. a subset of Se ⁇ (Sym(L)) determined by user input
  • Prx(L) the set of preexpressions for L, i.e. a subset of Seq(Seq($ym(L))) determined by the parser
  • Exp(L) the set of expressions for L, i.e. a distinguished subset of Seq(Tok(L))
  • Tre(L) the set of syntactic trees for L, each having reduced terms from Rdn(L) as nodes
  • Env(L) the set of external operational environments for L
  • Txt(NL) — > Seq Prx(NL)) is the text parser which maps any text string seTxt(NL) onto a sequence txtprs(s)eSeq(Vrx(NL)) of preexpressions, each of which is a sequence of pretokens, i.e. strings which are candidates for tokens (including punctuations) as determined by inclusion in the lexicon
  • lextyp NxTok(NL) -> Typ(NL) is the lexical type assignment such that lextyp(n,a)sTyp(NL) is the lexical type associated in the lexicon with the indexed token (n,a)
  • Ref(NL) is the lexical reference assignment such that t ⁇ «-e/(a,e)eRef(NL) is the reference associated in the lexicon with the token/type pair (a,e) e Tok(NL)xTyp(NL)
  • lexical reference properly reduces to an assignment lexref: Trm(NL) -» Ref(NL) on actual lexical terms, rather than on random token/type pairs, or even on lexical entries in general.
  • Syn(NL)xSyn(NL) - Syn(NL) is term application on pairs of complexes q,q'eSyn(NL) such that q[q']eSyn(NL) is the composite complex induced by the application of the syntactic term q to the term q'
  • Syn(NL)xRtp(NL) - Syn(NL) is the subterm designator such that ra ⁇ fry?2(q,d)eSyn(NL) is the leading subcomplex of syntactic type deRtp(NL) in the syntactic complex qeSyn(NL).
  • subtrm(0,d) 0 for any type deRtp(NL), where 0eSyn(NL) is the null term
  • synrep Exp(NL) — > Syn(NL) is the syntactic representation such that synrep(p) eSyn(NL) is the syntactic complex associated with the expression peExp(NL)
  • Sem(NL) is simply the algebraic closure under the semantic product over the kernel Ref(NL) of lexical references associated with the lexical terms of NL by means of the map lexref: Trm(NL) -> Ref(NL), i.e.
  • Sem(NL) is defined by induction on the semantic product as follows: a) identity object: there is an identity element 1 e Sem(NL) b) lexical object: lexrej(a,e)eSem(NL) for any lexical term (a,e)eTrm(NL), i.e.
  • Sem(NL) ⁇ - Mod(XL) is the formal representation such that ⁇ m/>ep(x)eMod(XL) is the formal model associated with the semantic context xeSem(NL)
  • Tns(NL) -»• Exp(XL) is the formal interpretation such that «/z ' «/(u)eExp(XL) is the formal expression associated with the tensored complex ueTns(NL)
  • extrep Mod(XL) - Env(NL) is the external representation such that exzrejc(M)eEnv(NL) is the external operational environment associated with the internal formal model MeMod(XL)
  • Trm(PL) Trm(EL) is the external term interpretation relative to an operational environment EeEnv(NL) with an associated executable language EL such that extint£ (v)eTrm(EL) is the external term corresponding to the protocol term veTrm(PL)
  • exttrn ⁇ Exp(XL) - Exp(EL) is the external translation relative to an operational environment EeEnv(NL) with an associated executable language EL such that exttrnji ( ⁇ )eExp(EL) is the executable translate of the formal expression ⁇ eExp(XL)
  • E is the execution process for an operational environment EeEnv(NL) with an associated executable language EL such that envexc ⁇ ( ⁇ )eE is the result of executing the formal expression ⁇ eExp(EL) in E
  • An execution process for an operational environment E ⁇ Efc
  • keN ⁇ and executable language EL is defined with respect to an execution procedure envprc ⁇ : Exp(EL)xN -> N such that for any executable expression ⁇ eExp(EL) and operational state index keN , the image state EJ ⁇
  • any specific execution of an expression ⁇ eExp(EL) is then simply the operational state envexc ⁇ ( ⁇ ) — envprc ⁇ ( ⁇ ,h)eE for some index neN.
  • Trm(PL) is the protocol interpretation such that/>c/z>z/:(x)eTrm(PL) is the protocol term corresponding to the semantic object xeSem(NL)
  • Mod(XL)xExp(XL) - Exp(PL) is the protocol encoding such that/?c coc?(M, ⁇ )eExp(PL) is the protocol expression associated with the formal expression ⁇ eExp(XL) as interpreted with respect to the formal model MeMod(XL)
  • pclrep Exp(PL) -> Env(NL) is the external protocol representation such that c/rep(X)eEnv(NL) is the external operational environment encoded into the protocol expression XeExp(PL)
  • METASCRIPT is an effective transformation mscript : Exp(NL) ⁇ Exp(PL) of natural language expressions into formal expressions in the abstract language PL associated with a universal protocol.
  • this protocol is explicitly designed to accommodate further effective translations exttr ⁇ : Exp(XL) - Exp(EL) for specific operational environments EeEnv(NL) with associated executable languages EL.
  • the protocol is XMP (for external Media Protocol), which is designed and used as a universal transaction medium for diverse digital components in a networked environment.
  • XMP for external Media Protocol
  • the invention and functionality of this system is not limited to any specific protocol.
  • Any text is presented to the system as a string of ASCII characters.
  • the specific method of text presentation is irrelevant. It may be voice-generated, entered via keyboard, optically scanned, etc.
  • the system itself does not explicitly include a module for speech recognition, optical character recognition, etc., but any effective mechanism for presenting ASCII text to the system is sufficient for its successful operation.
  • the text parser partitions any string of ASCII characters into sequences of sequences of ASCII substrings, using embedded blank spaces as delimiters. These inner sequences of substrings represent potential expressions, and the substrings themselves represent potential words or punctuation marks. For terminological convenience, any such parsed substring is called a pretoken, and the partitioned sequence in which it occurs is called a preexpression.
  • the parser produces the following sequence of pretokens:
  • Syntactic Processing Refer to module 3.2 of FIG. 3.
  • Type Association Refer to submodule 3.2.1 of FIG. 3.
  • Type Assignment Refer to submodule 3.2.1.1 of FIG. 3.
  • Each parsed pretoken is checked against the lexicon for recognition by the system. If apretoken is recognized, i.e. if that string is included in the lexicon as an actual token, then it is immediately associated with some type. This first order assignment of type to token is only tentative at this point in the process, since correct type association requires more than mere lexical recognition of text strings as legitimate tokens. Accordingly, these initially assigned types are considered to be "virtual" types.
  • Virtual type assignment on the sample preexpression parsed above yields the following list of token/type pairs (lexical terms) in the form (a,lextyp(0,a)) for each indicated token aeTok(NL) as shown in FIG. 4a. Since all pretokens listed there are recognized by the system as actual tokens, the parsed preexpression becomes an expression for further processing. However, the presence of ambiguous (virtual) types classifies this expression as "virtual”. Promotion to an "actual" expression is deferred to the type contextu ⁇ liz ⁇ tion process.
  • Lexical Insertion Refer to submodule 3.2.1.3 of FIG. 3.
  • a pretoken is not recognized by the system, then the user is prompted for information concerning lexical type and reference which may be properly associated with that pretoken in order to form a lexical term appropriate for inclusion in the lexicon. Upon such inclusion, the pretoken becomes a system token. This is the primary mechanism, and the only one under direct user control, by which the system learns new vocabulary.
  • Type Contextualization Refer to submodule 3.2.1.4 of FIG. 3. Second order type assignment uses the initial lexical type assignments to tokens in an expression as data for a contextual process by which actual types may be assigned to tokens depending on their syntactic roles relative to other tokens in that expression. For example, in the sentence
  • This method of type reassignment through syntactic context alone represents the simplest, most direct form of disambiguation employed by the system. More subtle mechanisms for further disambiguation, such as local type reduction and semantic contextualization, are deployed later in the process.
  • final type association on the sample expression being processed yields the list shown in FIG. 4b of refined lexical terms in the form (a,lextyp(j,aj) for each indicated token aeTok(NL), where j>0 is the (perhaps alternative) index corresponding to the appropriate refined type.
  • a,lextyp(j,aj) for each indicated token aeTok(NL)
  • j>0 is the (perhaps alternative) index corresponding to the appropriate refined type.
  • Proper syntactic dependencies between terms in an expression are established by means of a type reduction matrix.
  • the dimensions of this matrix help to determine the level of syntactic disambiguation quickly achievable by the system.
  • a 2-dimensional matrix which maps pairs of tokens into their relative reduction ordering, is minimal.
  • Higher dimensional reduction matrices which map longer sequences of tokens into their relative reduction orderings, are increasingly effective, but more costly to implement in terms of memory requirements and processing speed.
  • An optimal number of reduction dimensions depends critically on a complex combination of implementation constraints and performance criteria. Whatever the number of dimensions used, however, the system is designed to establish correct syntactic relationships on a relatively global scale (at least over an entire expression) simply by executing local term reductions in the proper order.
  • the local term reduction sequence is constructed as shown in FIG. 5.
  • Term Inversion Refer to submodule 3.2.2.3 of FIG. 3.
  • Proper chains of syntactic dependencies among tokens in a sentence, and the resultant dependencies between those chains, are constructed by means of an effective inversion of the term reduction process. These chains are then used to generate branches of the syntactic tree which represents the underlying syntactic structure of the expression being processed.
  • term inversion produces the maximal chains shown in FIG. 6a.
  • term reduction has effected critical type modulations in some of the subordinate terms, viz. ob j— >obd (general object -> direct object), ob j ⁇ -obi (general object -> indirect object), ob j ⁇ obs (general object -» verb subject), and obj ⁇ obp (general object — > preposition object).
  • critical type modulations in some of the subordinate terms, viz. ob j— >obd (general object -> direct object), ob j ⁇ -obi (general object -> indirect object), ob j ⁇ obs (general object -» verb subject), and obj ⁇ obp (general object — > preposition object).
  • Each expression for a language is represented at the fundamental syntactic structural level by a tree, i.e. by a finitely branching partial order of finite length having elements corresponding to lexical terms ordered by their associated type reductions.
  • This tree has a canonical representation as a natural branching diagram, which in turn becomes represented as a (usually composite) term (syntactic complex) in an associated syntactic algebra.
  • the branches of this tree correspond directly to the chains of syntactic dependencies established in the term resolution process.
  • the syntactic tree representing the sample text being processed becomes structured as shown in FIG. 6b.
  • Nodes of a syntactic tree and the syntactic order relations which hold between them are interpreted in terms of a many-sorted syntactic structure.
  • the canonical language for this class of structures is based on the notion of a type/token complex to form basic terms, and the operation of term application on pairs of these complexes to form more complicated or composite terms.
  • Each term, whether basic or composite, ultimately corresponds to a semantic object with certain characteristics, and these objects in their various configurations comprise the domains and internal relations of the syntactic structures considered.
  • the fonnal correlate of an expression in a natural language is a term in the associated syntactic algebra which represents the effective translation of the expression into a form suitable for direct system interpretation.
  • This algebra is based on an operation of term application which, given a translation of natural language expressions into algebraic terms, transforms syntactic dependencies at the level of natural language into specific formal relations at an algebraic level.
  • the syntactics for natural language becomes a matter of effective computation.
  • trerep Tre(NL) -> Syn(NL) is given by induction on syntactic dependence:
  • trerep(d,a) syntrm(a,d) for any trivial (single node) tree (d,a)eTre(NL)
  • trerepit L t') trerep(t)[trerep(t')] for any trees t, eTre(NL)
  • trerep(t L (t',t")) (trerep ⁇ t [trerep(t , )T)[trerep(t") ' for any trees t,t',t"eTre(NL)
  • Semantic Processing Refer to module 3.3 of FIG. 3.
  • Semantic Representation Refer to submodule 3.3.1 of FIG. 3.
  • Trm(NL) - Ref(NL) forms the basis for the interpretation of terms in the semantic algebra Sem(NL) as objects constructed over the reference domain Ref(NL).
  • each lexical term (token/type pair) (a,e)eTrm(NL) is explicitly associated in the lexicon with a reference /exre/(a,e)eRef(NL) which instantiates the term in a given environment; in fact, no lexical term is properly defined until such a reference for it is specified, since this definition forms the principal link between a natural language term and its intended meanings.
  • the basic objects of the semantic algebra are simply these lexical references, i.e. Ref(NL) c Sem(NL).
  • semrep(0) 1 where 0eSyn(NL) is the null complex and leSem(NL) is the identity object
  • c) composite - se/ ⁇ zre/(q)*.semre/(q')eSem(NL) for any complexes q,q'eSyn(NL)
  • Sem(NL) derived from the individual lexical references /exre/(aj,ej)eRef(NL) c Sem(NL).
  • Sem(NL) xSem(NL) - Sem(NL) is based on a notion similar to class inheritance from the formal practice of object-oriented design (OOD). Specifically, each reference in Sem(NL) is instantiated as an object with certain characteristics, and the product x*y of two objects x,j eSem(NL) is simply the object z Sem(NL) generated in the inheritance lattice as the minima] common upper bound of x and y, withy dominant over x on issues of conjunctive consistency. Note that by virtue of consistency dominance this product need not be commutative, i.e.
  • Tns(NL) Syn(NL)®Sem(NL) defined over the representation semrep : Syn(NL) ⁇ Sem(NL) is composed of tensored correlations q®semrep(q) of syntactic terms qeSyn(NL) and associated semantic representations semrep(q)& era( L), which form the direct computational basis for formal interpretations of natural language expressions in terms of a fundamental transaction paradigm.
  • syntactic relationships encoded in a syntactic complex q(p)eSyn(NL) derived from an expression peExp(NL) permit an exact internal structuring of the associated semantic complex semrep(q(p))eSem(NL).
  • Formal Representation Refer to submodule 3.3.2 of FIG. 3.
  • syntactic term 7Jc//77 ⁇ z(q,ABC)eSyn(NL) is the leading subterm of q of type ABC; in particular, subterms /?c/rr7 «(q,ENV)eSyn(NL) of metatype ENVeTyp(PL) (indicating "environment") play a significant role.
  • Trm(XL) establishes the syntactic basis for the formalization of NL, since expressions ⁇ (p)eExp(XL) constructed as formal interpretations of natural expressions peExp(NL) are built from terms in Trm(XL) associated through trmint with appropriate objects in Sem(NL).
  • ⁇ (p) q(p)® r77Z77z ⁇ :(x(p))
  • MEAO Modified Environment Action Object
  • K(p) is the minimal substructure of M(p) satisfying ⁇ (p), i.e. M(p) 2 K(p)
  • ⁇ (p).
  • the expression ⁇ (p) is not an executable translate of the natural expression p; instead, it is a formal interpretation of the precise conditions, relative to an abstract model of an operational environment, upon which an effective executable form of p may be constructed.
  • the ultimate transition to an appropriate external operational environment is properly accomplished by means of the metasemantic protocol.
  • the construction of a metaformal expression X(p)eExp(PL) as a fully effective translation of an expression peExp(NL) is simply a machine interpretable codification of the satisfaction relation M(p)
  • ⁇ (p).
  • XMPL When PL is XMPL, the formal language associated with the universal protocol XMP, this metasyntactic construction is straightforward. Indeed, the syntax of XMPL is explicitly designed to accommodate MEAO semantics. In general, what is specified by any XMPL expression is an operational environment, a n-ansaction mapping, a mapping domain, a mapping range, and a mapping argument. Refinements to any of these elements are indicated by appropriate nestings of subelements.
  • XMP transaction protocol
  • ENV operational environment
  • MAP transaction mapping
  • DMN mapping domain
  • RNG mapping range
  • ARG mapping argument
  • This metaformal syntactic scheme provides an effective universal template for abstract computations, and most significantly, for further translations into exact forms which are executable in specific operational environments.
  • an external representation extrep Mod(XL) - Env(NL) which associates appropriate external operational environments with internal formal models.
  • these internal structures are explicitly designed as abstract models of external environments in order to accommodate this representation; accordingly, the external representation extrep may be viewed as the inverse of an internal representation intrep : Env(NL) -» Mod(XL) arising from a prior analysis of those operational environments which are relevant to the system.
  • the universal transaction protocol XMP easily accommodates externalization since its associated formal language XMPL naturally translates into executable languages such as SQL (Standard Query Language) and SMTPL (the language of the mail protocol SMTP), and even executable extensions of XML (extensible Markup Language), in the manner indicated above, where EL is any of these executable languages.
  • XMPL forms a natural bridge between the internal semantics of the system and the external semantics of the environments in which it operates.
  • the control structure of XMP then makes these external translations effective by finally facilitating in appropriate operational environments the execution of commands originally issued by the user in natural language.
  • XeExp(XMPL) encodes the instruction to execute in environment E the operation fj? with domain Aj? and range Bg on the argument xjf; , where the external elements
  • METASCRIPT is currently implemented as a translation from a natural language (English) into the formal language (XMPL) associated with a universal transaction protocol (XMP: external Media Protocol).
  • XMPL formal language
  • XMP external Media Protocol
  • this formal language is suitable for interpretation by digital components in external operational environments into executable machine instructions.
  • METASCRIPT allows a human user to communicate naturally in an effective manner with (and through) any programmable device, hence networked congigurations of such devices, compatible with the protocol XMP.
  • METASCRIPT is a natural language interface to any sufficiently capable digital environment, whether it be a single device such as a computer, a cellular phone, a PDA (Personal Digital Assistant), a kitchen appliance, an automobile, or a whole network of such devices such as a local intranet or the global Internet.
  • a complete NLP system the combined technologies of METASCRIPT and XMP enable a seamless integration of all participants, human and digital alike, into an effective ubiquitous network.
  • the fundamental algorithm upon which METASCRIPT is based employs a reduction to formal syntactic structures over terms defined in an extensible lexicon.
  • This term reduction incorporates both syntactic type and semantic context to achieve an effective formal representation and interpretation of the meaning conveyed by any natural language expression.
  • Extensibility of the lexicon under specific user direction provides the capacity for the system to expand its knowledge of vocabulary and usage, and consequently, offers an effective mechanism under user control for establishing definite incremental enhancements to the system's linguistic capabilities, hence substantially increasing the system's familiarity with (and competence in) particular operational environments.
  • the system automatically gains functional complexity through its object-oriented semantics, whereby the addition of formal terms having composite objects generated by algebraic representations of natural linguistic terms as direct references permits unlimited efficiency and sophistication of machine comprehensible natural language usage. Put simply, the system learns as it goes. Moreover, any desired level of syntactic disambiguation is attainable by increasing the local dimensionality of the underlying reduction matrix, though this feature is part of the underlying algorithm, and therefore independent of user modulation.

Abstract

Presented is a system and method for converting or translating expressions in a natural language such as English into machine executable expression in a formal language. This translation enables a transformation from the syntactic structures of a natural language into effective algebraic forms for further exact processing. The invention utilizes algorithms employing a reduction of sequences of terms defined over an extensible lexicon into formal syntactic and semantic structures. This term reduction incorporates both syntactic type and semantic context to achieve an effective formal representation and interpretation of the meaning conveyed by any natural language expression.

Description

COMPUTER SYSTEM WITH NATURAL LANGUAGE TO MACHINE
LANGUAGE TRANSLATOR
BACKGROUND OF THE INVENTION:
The present invention is directed to a system which translates natural (human) language into an abstract formal language. This formal language is explicitly designed to serve as a universal template for further translations into a comprehensive variety of machine languages which are executable in specific operational environments. Extensive efforts have been made, many articles have been published, and many patents have been issued, all directed toward the goal of providing computers with the capacity to understand natural (human) language sufficiently well to respond reliably and accurately to directives issued from human users. Many companies and research groups, such as AT&T, IBM, and Microsoft, and an assortment of academic institutions, are presently working on natural language processing (NLP).
To date, many different approaches have been tried to provide a system which effectively converts natural language to a formal language for computer applications. One such approach is disclosed in an article published by Microsoft Corporation titled "Microsoft Research: Natural Language Processing Hits High Gear" dated May 3, 2000. The article discloses that Microsoft is heavily focused on a database of logical forms, called MindNet (TM), and the creation of a machine translation application. It is stated that MindNet is an initiative in an area of research called "example-based processing", whereby a computer processes input based on something it has encountered before. The MindNet database is created by storing and weighting the semantic graphs produced during the analysis of a document or collection of documents. The system uses this database to find links in meaning between words within a single language or across languages. These stored relationships among words give the system a basis for "understanding", thereby allowing the system to respond to natural language input. MindNet apparently contains the contents of several dictionaries and an encyclopedia to increase its level of understanding. Another approach is disclosed in Microsoft U.S. Patent No. 5,966,686. This approach provides a rule-based computer system for semantically analyzing natural language sentences. The system first transforms an input sentence into a syntactic parse tree. Semantic analysis then applies three sets of semantic rules to create an initial logical form graph from this tree. Additional rules provide semantically meaningful labels to create additional logical form graph models and to unify redundant elements. The final logical form graph represents the semantic analysis of the input sentence.
Yet another, and apparently more common, approach is provided by U.S. Patent No. 5,895,466, wherein a database stores a plurality of answers which are indexed to natural language keys. The natural language device receives a natural language question over the network from a remote device and the question is analyzed using a natural language understanding system. Based on this analysis, the database is then queried and an answer is provided to the remote device.
Applicant is aware that various other approaches toward providing a conversion from natural language to some machine language have been tried. However, the prior art has not provided a truly effective conversion system of this sort. SUMMARY OF THE INVENTION:
Presented is a system and method for converting or translating expressions in a natural language such as English into machine executable expressions in a formal language. This translation enables a transformation from the syntactic structures of a natural language into effective algebraic forms for further exact processing. The invention utilizes algorithms employing a reduction of sequences of terms defined over an extensible lexicon into formal syntactic and semantic structures. This term reduction incorporates both syntactic type and semantic context to achieve an effective formal representation and interpretation of the meaning conveyed by any natural language expression.
The foregoing features and advantages of the present invention will be apparent from the following more particular description of the invention. The accompanying drawings, listed herein below, are useful in explaining the invention.
BRIEF DESCRIPTION OF THE DRΛ WINGS:
FIG. 1 shows the hardware architecture of a computer system comprising the natural language converter of the present invention; FIG. 2 shows the general process and data flow of the inventive system; FIG. 3 shows a more detailed flow diagram for the inventive system; FIG. 4a shows the results of virtual type assignment applied to a sample text; FIG. 4b shows the results of actual type assignment for the same text; FIG. 5 shows the term reduction sequence for a sample text; FIG. 6a shows the sequence of dependency chains for a sample text; FIG. 6b shows the associated syntactic tree for the same text; and
FIG. 7a shows the schema of structures and maps involved in the external interpretation of a text; FIG. 7b shows this external interpretation schema as controlled by a metasemantic protocol.
BRIEF DESCRIPTION OF THE INVENTION:
Refer to FIG. 1 for an overview of the system architecture. As mentioned above, the inventive system, called METASCRIPT (TM), provides a method for translating expressions in a natural language such as English into machine executable expressions. In the embodiment of the system and method to be described, the user inputs text in a natural language through some input device to a known computer system which may comprise a standalone computer system, a local network of computing devices, or a global network such as the Internet, using wired land lines, wireless communication, or some combination thereof, etc. This computer system includes memory for storing data, and a data processor. The text may be entered into the client device or local VDM (Video Display Monitor) (1.1) by any suitable means, such as direct input via keyboard (1.1.1), voice input via speech recognition means (an SR system) (1.1.2), or indirect input via optical scanning (an OCR system) (1.1.3). The natural language text input to the system is passed along the network or local bus (1.3) to a server or local CPU (Central Processing Unit) (1.2) where it is processed in accordance with the inventive method and system. This processed output of the system is then provided to the system for distribution to the original input device (1.1), or to other collateral devices (1.4) which may be one or more digital computers, mobile devices, etc. The inventive system thus comprises a natural language interface to any sufficiently capable digital environment. Refer now to FIG. 2 for an overview of the process and data flow of the inventive system. The invention will be subsequently discussed in more detail herein below. Natural language text input is entered by the user (2.0) into the internal system (2.1) by means of a text processing module (2.1.1) which parses the text. The output of the text processing module comprises a parsed sequence of preexpressions which is entered into the syntactic processing module (2.1.2) which provides syntactic type information, establishes proper syntactic dependencies between terms in expressions, and represents these expressions as complexes in a syntactic algebra. The output of the syntactic processing module, comprising a sequence of these syntactic complexes, is entered into the semantic processing module (2.1.3) in order to achieve a semantic interpretation of the input text. The output of the semantic processing module, comprising a formal interpretation of the input text, is entered into an external system by means of the external processing module (2.2.1), which finally provides a sequence of executable expressions derived from the input text for use in a specific operational environment (2.2.2).
As noted above, the means for providing text input to the system, such as through a keyboard, scanner, or speech recognition system, are well known in the art and are commercially available. Another standard component of the present system is a text parser, construed here in an extremely narrow sense as a limited process restricted to partitioning text strings into syntactic subcomponents such as paragraphs, sentences, and words. As such, the text parser discussed herein does not provide further linguistic information such as grammatical types, syntactic dependencies, semantic import, etc. Such limited text parsers are standard components of any natural language processing system, and exceedingly well known in the art. Yet another component in the present system which plays a relatively standard role is the lexicon or "electronic dictionary". In general, lexicons are also well known in the art and are discussed in many patents including U.S. Patent Nos. 5,371,807; 5,724,594; 5,794,050; and 5,966,686. However, the notion and function of "virtual" types, which play a significant syntactic categorization role in the passive specification of lexical terms, and hence strongly contribute to the definition of the particular lexicon used in the inventive system, are not standard, and so require careful description. On the other hand, since text input devices, text parsers, and their operation are so well known, they will not be further described in detail herein.
Refer now to FIG. 3, which shows more details of the inventive system. The components, modules, and submodules of the inventive system are enumerated for convenient reference so that the operation and application of the system and method may be described in detail.
As mentioned above, natural language text is entered by the user (3.0) into the text input submodule (3.1.1) of the text processing module (3.1) via any suitable means including a keyboard or a speech recognition system. For the purposes of this discussion, the user input signal is simply some linguistic data stream which is digitized into a string of ASCII characters. This ASCII string is the input text.
In order to clarify the following discussion, it is helpful to note that any natural language text is typically organized into a sequence of paragraphs, each of which is a sequence of sentences, each of which is a sequence of words, each of which is a sequence of characters (alphanumeric symbols). All of this nested syntactic structure must be taken into account if an effective interpretive analysis is to be achieved. The role of the text parser is to determine and then present these nested sequential structures to the system for further processing. Thus in general, the adequate output of the text parser is a sequence of sequences of sequences of sequences of ASCII characters. This level of generality, however, tends to obscure the basic points of any useful description of the inventive system, so a technical compromise is adopted herein, whereby any text is considered to comprise a sequence of sentences, or more properly, of expressions, each of which comprises a sequence of words. Until recognized by the system as a meaningful unit of linguistic analysis, however, any such word in a text is simply treated as a partitioned substring of the input text string. Thus the proper output of the text parser is considered here to be a sequence of sequences of "pretokens", where a pretoken is a text fragment which is a candidate for a word, i.e. an ASCII (sub)string presented for recognition as a system "token". The system lexicon is a lexicographically ordered list of such tokens (with associated type and reference data), and recognition by the system of a pretoken as an actual token is simply a matter of exact string comparison.
Accordingly, the output of the text parser (3.1.2) is a sequence of sequences of pretokens, or sequence of "preexpressions", which is then passed to the type assignment submodule (3.2.1.1) of the type association submodule (3.2.1), where syntactic processing is initiated. Each pretoken is checked against the system lexicon (3.2.0) for its status as a recognized lexical token. If a pretoken is recognized, i.e. if the string comprising that pretoken is included in the lexicon as an actual token (with associated syntactic and semantic data), then it is assigned a lexically associated syntactic type. The system determines at decision node (3.2.1.2) whether all the pretokens from the entered text have been recognized as system tokens. If the determination is negative, then as indicated by the "no" connection to the lexical insertion submodule (3.2.1.3), the user is given the option to add the unrecognized pretokens to the system as tokens with associated type and reference data, i.e. to insert new terms into the lexicon, for further processing. On the other hand, if the determination is affirmative, then the resulting sequence of sequences of lexically typed tokens, or sequence of "virtual" expressions, is passed along the "yes" connection to the type contextualization submodule (3.2.1.4). This submodule initiates a second order type assignment which uses the initial (or virtual) lexical type assignments as data for a contextual process which may reassign these initial types depending on the relative syntactic roles of the tokens in the virtual expressions being processed. Upon complete (re)assignment of appropriate types to tokens, each virtual expression is promoted to an "actual" expression, and each token/type pair becomes a fully functional lexical term with associated semantic data.
Thus the output of the type association submodule (3.2.1) of the syntactic processing module (3.2) comprises a sequence of (actual) expressions, and is passed to the term correlation submodule (3.2.2.1) of the term resolution module (3.2.2). The output of this submodule is a sequence of sequences of fully correlated lexical terms, which is then entered into the term reduction submodule (3.2.2.2), wherein proper syntactic dependencies between terms in an expression are established by means of a type reduction matrix. The output of this submodule is a sequence of sequences of reduction links, which is entered into the term inversion submodule (3.2.2.3), wherein these reduction links are used to construct syntactic trees, each tree representing a processed expression. The resulting sequence of syntactic trees is passed to the syntactic representation submodule (3.2.3), wherein each expression is then represented as a syntactic complex, i.e. a (usually composite) term in the syntactic algebra.
Semantic processing (3.3) is initiated in the semantic representation submodule (3.3.1), wherein the input sequence of syntactic complexes from the syntactic processing module (3.2) is represented as a full semantic complex, i.e. a structure of internal objects in the semantic algebra. This semantic complex is then passed to the formal representation submodule (3.3.2), wherein the input semantic complex is represented as a formal structure adhering to a fundamental transaction paradigm. This formal semantic model is then combined with the sequence of syntactic complexes output from the syntactic processing module to form the input to the formal interpretation submodule (3.3.3), wherein a sequence of formal expressions is constructed as an interpretation of the presented syntactic and semantic data.
In addition, the output of the formal representation submodule is passed to the external representation submodule (3.4.1) of the external processing module (3.4), wherein a specific external representation appropriate for the formal semantic data presented is identified. This external representation is combined with the sequence of formal expressions output from the formal interpretation submodule to form the input to the external interpretation submodule (3.4.2), wherein a sequence of executable expressions is constructed accordingly for ultimate processing in the appropriate operational environment (3.5).
DETAILED ESCRIPTION OF THE INVENTION:
INTRODUCTION:
METASCRIPT is a translation from a natural language into an executable formal language. This translation is essentially a transformation from the syntactic structures of natural language into effective algebraic forms suitable for further processing. The formal semantics which finally determines the ensuing interpretations and executions of these formal expressions in external operational environments is object-oriented.
The fundamental algorithm upon which METASCRIPT is based employs a reduction to formal syntactic structures over terms defined in an extensible lexicon. This term reduction incorporates both syntactic type and semantic context to achieve an effective formal representation and interpretation of the meaning conveyed by any natural language expression. Extensibility of the lexicon under specific user direction provides the capacity for the system to expand its knowledge of vocabulary and usage, and consequently, offers an effective mechanism under user control for establishing definite incremental enhancements to the system's linguistic capabilities, hence substantially increasing the system's familiarity with (and competence in) particular operational environments. Put simply, the system learns as it goes. In addition, any desired level of syntactic disambiguation is attainable by increasing the local dimensionality of the underlying reduction matrix, though this feature is part of the underlying algorithm, and therefore independent of user modulation.
It should be noted that METASCRIPT is not a speech recognition system. Instead, it is a fully capable natural language interpreter. Specifically, METASCRIPT translates natural language expressions into expressions in a formal language associated with an abstract network protocol. A more detailed account of this process follows.
NOTATION:
Standard mathematical notation is used to clarify the presentation of certain technical features of the system. In particular, the following set-theoretical notation appears throughout this discussion: a) a set is a collection of things, called elements. For example, N = {0,1,2,3,...} is the set of natural numbers.
Note: In general, a set A is most conveniently determined by some property P of its elements, indicated by use of so-called "set-builder notation" as A - {x | P(x)} = the set of things x satisfying property P.
b) the expression 'x A' indicates that the thing x is an element of the set A c) the expression 'A^B' indicates that the set A is a subset of the set 5 , i.e. that every element of A is an element of B as well d) a map (or function) is a relation between two sets A,B such that each element in A is assigned a unique element in B. The expression '/: A → B' indicates that/is a map from the set A to the set B, i.e./assigns a unique element^ -fix)eB \o each element xe^l. The composition of maps f: A → B and g : B → C on sets A,B,C is the map h = g°f : A → C defined such that h(x) — g(βx))eC for any x<=A.
Note: A program is a function which maps input onto output in an effective manner, i.e. by means of a finite, discrete, deterministic procedure; in fact, any process or procedure is effective precisely to the extent that it is executable as a program of this sort.
e) for any sets A,B, the Cartesian product AxB consists of all pairs (x,y) such thatx ^ and yeB, i.e. AxB = {(x,y) I x&A , yeB} f) for any sets A,B the union AVJB is the set consisting of all elements x such that xeA or xε B, i.e. A B = {x I xeA orxeB}; for any collection C= {A\ | union U C is the set consisting of unions over all sets
Figure imgf000007_0001
g) for any algebras A,B and representation/: A - B, the correlated tensor product A®jB is the distinguished subset of AxB which consists of all pairs (xf(x)) for xeA, i.e. A®jB = { (xf(x)) | .xe } = the graph off; for an implicit representation, the map subscript may be omitted, i.e. A®B = A®jB for some implicit f A -> B h) for any set , Seq(A) is the set of finite sequences from .4, i.e. Seq(A) = {(arj,...,an) | aje<4 , neN}
language: a structure over the following components: a) alphabet: a set of basic symbols b) punctuation symbols: a set of basic symbols disjoint from the alphabet c) words: admissible sequences of basic symbols d) punctuations: admissible sequences of punctuation symbols e) expressions: admissible sequences of words and/ 'or punctuations f) sentences: complete expressions g) syntax: a specification of which
- sequences of basic symbols are admissible as words
- sequences of punctuation symbols are admissible as punctuations
- sequences of words and/ 'or punctuations are admissible as expressions
- expressions are admissible as sentences g) semantics: a scheme of interpretation over words whereby expressions acquire meaning with respect to certain external structures
A number of languages enter into this discussion:
1) natural language: any of the human languages in current use, e.g. English, each characterized by an informal, and hence notoriously ambiguous, syntax and semantics 2) formal language: a highly structured language, usually mathematical in origin and use, characterized by a uniquely readable, recursive syntax and an extensional, usually first-order semantics; in short, a language for which the syntax and semantics is effectively unambiguous
3) object language: a formal language which is interpretable relative to a class of extensional structures, i.e. a formal language with an object-oriented semantics
4) protocol language: a formal language which mediates transactions between addressable nodes on a network
5) executable language: a formal, programmable language which encodes instructions directly implementable by a suitably capable machine such as a computer system: the integrated process which manifests METASCRIPT, and which may be implemented as software running on any programmable device string: a sequence of ASCII characters text: a string presented to the system as the fundamental unit of initial input parser: a process which partitions texts into sequences of sequences of substrings preexpression: a sequence of substrings of some text, distinguished as a unit of syntactic processing by the text parser lexicon: a system specific, indexed, lexicographically ordered list of designated strings, called tokens, each of which is associated with certain syntactic and semantic information; in particular, each token is associated with a lexical type, which may be virtual (syntactically ambiguous) or actual syntactically unambiguous); furthermore, each token which is associated with an actual type is also associated with a lexical reference, which provides basic semantic information
Note: A single string may serve as a token with multiple entries, associated with a number (including 1) of virtual types and a number of actual types, reflecting that token 's multiple syntactic roles, e.g. as a verb and an object, or an object and an adjective, etc. Although there is considerable variability in such syntactic multiplicities among lexical entries, it is still the case that every token is associated with at least one actual type. token: a string recognized by the system in the sense that it is included in the system lexicon type: a syntactic category used to organize semantically similar tokens; there are three sorts:
1) virtual: a lexical type which is ambiguous
2) actual: a lexical type which is not ambiguous
3) reduced: a syntactic type which has specific semantic functionality upon term reduction term: there are six sorts:
1) lexical: a token! type pair in the lexicon with associated reference data
2) reduced: a token/type pair for which the type is reduced
3) syntactic: an element of the syntactic algebra associated with a language
4) semantic: an element of the semantic object algebra associated with a language
5) tensored: an element of the semantic tensor algebra associated with a language
6) formal: an interpretable element of a formal language reference: there are two sorts:
1) internal: an object with which a term is associated, either in the lexicon or in the semantic object algebra
2) external: an object with which an internal semantic object is associated in some operational environment expression: a sequence of tokens sentence: a syntactically correct, semantically complete expression chain: a linearly ordered set of nodes (usually comprising a subset of some tree) tree: a partially ordered set of nodes model: a semantic structure M for a formal language FL consisting of a) a set Y)om[ of individuals (called the "domain" of M) b) a set Rln(M) of relations on Dom(M) c) a set Obj(M) of objects (each object containing elements of Dom(M) as elements; also, there is a null object OeObj(M) for technical reasons) d) a set Map(M) of functions between objects
These individuals, objects, relations, and functions are formal interpretations of corresponding terms in the language FL, and expressions of FL which correctly describe configurations of these various elements which actually obtain in the model are considered to be "true" in the model, or "satisfied" by the model; this satisfaction relation between a model M and an expression φ of the language FL is denoted as " M |= φ ", meaning that "M is a model of φ", "M satisfies φ", "φ is satisfied in M", "φ is true in M", or "φ holds in M".
operational environment: a dynamic structure E represented as a series of static states E^ (for keN), each of which comprises a model for an executable language EL
Throughout this discussion, the terms "internal" and "external" are applied relative to the system itself. Thus any component, module, process, or method which is part of the system is considered to be internal, while any user of the system or operational environment for the system is regarded as being external. This distinction is more logical than physical, since an external operational environment for the system may reside on the same computer system or device which hosts the system.
SYSTEM SETS:
The following sets are defined relative to a language L:
Sym(L): the set of basic symbols for L, usually including an alphabet and various punctuation symbols
Tok(L): the set of lexical tokens for L, i.e. a distinguished subset of Seq(S>ym )) Ltp(L): the set of lexical types for L
Rtp(L): the set of reduced types for L, including a null type
Trm(L): the set of lexical terms for L, i.e. a distinguished subset of Tok(L)xLtp(L)
Rdn(L): the set of reduced terms for L, i.e. a distinguished subset of Tok(L)xRtp(L)
Ref(L): the domain of lexical references for L
Lex(L): the lexicon for L, i.e. a distinguished subset of NxTrm(L)xRef(L) consisting of a lexicographically ordered list of (indexltokenl type/reference) entries
Txt(L): the set of texts for L, i.e. a subset of Se^(Sym(L)) determined by user input
Prx(L): the set of preexpressions for L, i.e. a subset of Seq(Seq($ym(L))) determined by the parser
Exp(L): the set of expressions for L, i.e. a distinguished subset of Seq(Tok(L))
Snt(L): the set of sentences for L, i.e. a distinguished subset of Exp(L)
Tre(L): the set of syntactic trees for L, each having reduced terms from Rdn(L) as nodes
Syn(L): the syntactic algebra for L
Sem(L): the semantic object algebra for L
Tns(L): the semantic tensor algebra for L, viz. Tns(L) = Syn(L)®Sem(L) for a canonical map /: Syn(L) → Sem(L)
Mod(L): the set of internal formal models for L
Env(L): the set of external operational environments for L
SYSTEM MAPS:
The following maps are defined relative to a natural language NL, an associated object language XL, a protocol language PL, and an executable language EL associated with an operational environment E:
txtprs : Txt(NL) — > Seq Prx(NL)) is the text parser which maps any text string seTxt(NL) onto a sequence txtprs(s)eSeq(Vrx(NL)) of preexpressions, each of which is a sequence of pretokens, i.e. strings which are candidates for tokens (including punctuations) as determined by inclusion in the lexicon
lextyp : NxTok(NL) -> Typ(NL) is the lexical type assignment such that lextyp(n,a)sTyp(NL) is the lexical type associated in the lexicon with the indexed token (n,a)
lexref: Tok(NL)xTyp(NL) -» Ref(NL) is the lexical reference assignment such that tø«-e/(a,e)eRef(NL) is the reference associated in the lexicon with the token/type pair (a,e) e Tok(NL)xTyp(NL)
Note: If lexref(a,e) =0(= null reference) then either (a,e) is not a recognized token/type pair, i.e. there is no entry in the lexicon of the form (n/a/e/r) for any index n, or e is a virtual type; accordingly, the set of lexical terms for NL is defined as Trm(NL) = {(a,e)eTok(NL)xTyp(NL) | lexref(a,e) ≠0 }. Thus lexical reference properly reduces to an assignment lexref: Trm(NL) -» Ref(NL) on actual lexical terms, rather than on random token/type pairs, or even on lexical entries in general.
syntrm : Rdn(NL) -> Syn(NL) is the assignment of a basic syntactic term q(d,a) = syj7tr/?ι(a,d)eSyn(NL) to each reduced term (a,d)eRdn(NL)
•[•] : Syn(NL)xSyn(NL) -» Syn(NL) is term application on pairs of complexes q,q'eSyn(NL) such that q[q']eSyn(NL) is the composite complex induced by the application of the syntactic term q to the term q'
Note: In fact, Syn(NL) is simply the algebraic closure under the operation of term application over the set of basic complexes associated with the lexical terms of NL by means of the map syntrm on Rdn(NL), i.e. Syn(NL) is defined by induction on term application as follows: a) null complex: there is a null term Oe Syn(NL) b) basic complex: q(d,a) = syntrm(a,d)eSyn N ) for any reduced term (a,d)eRdn(NL) c) composite complex: q[q']eSyn(NL) for any q,q'eSyn(NL); q[0] = 0[q] = q for any qeSyn(NL) d) completeness: q Syn(NL) iff q is either the null complex or a basic complex, or a composite complex generated by a finite sequence of term applications over a set of basic complexes
syntyp : Syn(L) -» Rtp(L) is the syntactic type designator defined by induction on Syn(L) as follows: a) null complex: syntyp( ) = nul eRtp(NL), where 0eSyn(NL) is the null term and nul is the null type b) basic complex: syntyp(q(d,a)) = deRtp(NL) for any reduced term (a,d)eRdn(NL) c) composite complex: syntyp(q[q']) = syntyp(q) eRtp(NL) for any q,q' eSyn(NL);
subtrm : Syn(NL)xRtp(NL) -» Syn(NL) is the subterm designator such that raδfry?2(q,d)eSyn(NL) is the leading subcomplex of syntactic type deRtp(NL) in the syntactic complex qeSyn(NL). The precise definition of subtrm proceeds by induction on Syn(NL) as follows: a) mdl complex: subtrm(0,d) = 0 for any type deRtp(NL), where 0eSyn(NL) is the null term b) basic complex: subtrm(q(d a),d) = q(d',a) if d'=d, and subtrm(q(d'' ,a),d) = 0 if d'≠d, for any type deRtp(NL) and reduced term (a,d')eRdn(NL) c) composite complex: subtrm(q q d) = røfør#z(q",d)6Syn(NL) for any q,q'eSyn(NL) and type deRtp(NL), where q"=q if subtrm(q,d) ≠ 0, and q"=q' otherwise
Note: subtrm(q,d) = 0 if there is no subcomplex of type deRtp(NL) in the syntactic complex qeSyn(NL); otherwise, syntyp(subn-m(q,d)) = deRtp(NL). Also, by default, subtrm(q,n l) = 0 for all qeSyn(NL), i.e. the null term is a subterm of every syntactic complex.
synrep : Exp(NL) — > Syn(NL) is the syntactic representation such that synrep(p) eSyn(NL) is the syntactic complex associated with the expression peExp(NL)
Note: The system currently constructs synrep as a composition of the maps syntre : Exp(NL) — > Tre(NL) and trerep : Tre(NL) -» Syn(NL), where syntre associates a syntactic tree syntre(p)eTre(NL) with each expression peExp(NL), and trerep associates a syntactic complex trerep(t)eSyn(NL) with each syntactic tree teTre(NL) over the assignment of basic syntactic terms q(d,a) = syntrm(a,d) to nodes (a,d)et, i.e. the representation synrep = trerep° syntre : Exp(NL) — Syn(NL)
* : Sem(NL)xSem(NL) -> Sem(NL) is the semantic product on semantic objects in the semantic object algebra Sem(NL) such that *(x,v) = x* >eSem(NL) is the minimal common upper bound of any pair of objects x,y in the induced class inheritance lattice on Sem(NL), with the second term of the product dominant over the first relative to consistency issues, i.e. y dominant over x in x*y
Note: In fact, Sem(NL) is simply the algebraic closure under the semantic product over the kernel Ref(NL) of lexical references associated with the lexical terms of NL by means of the map lexref: Trm(NL) -> Ref(NL), i.e. Sem(NL) is defined by induction on the semantic product as follows: a) identity object: there is an identity element 1 e Sem(NL) b) lexical object: lexrej(a,e)eSem(NL) for any lexical term (a,e)eTrm(NL), i.e. Ref(NL)cSem(NL) c) composite object: x*yeSem(NL) for any objects x,yeSem(NL); x*l = l*x = x for each xeSem(NL) d) completeness: xeSem(NL) iff x is either the identity object or a lexical object, or a composite object generated by a finite sequence of semantic products over a set of lexical objects
semrep : Syn(NL) — » Sem(NL) is the semantic representation such that semrep(q)eSem(NL) is the semantic reference associated with the syntactic term q Syn(NL); moreover semrep is the implicit representation in the definition of the semantic tensor algebra Tns(NL) = Syn(NL)®Sem(NL) = Syn(NL)®5ew2repSem(NL)
Note: Semantic reference makes contact with lexical reference in that semrep(q(d(e),a)) = Zexre (a,e)eRef(NL) for every basic complex q(d(e),a)eSyn(NL) naturally associated (after term reduction: (a,e) -» (a,d(e)) ) with a reduced term (a,d(e))eRdn(NL); moreover, semrep : Syn(NL) — Sem(NL) is a homomorphism from the syntactic algebra to the semantic object algebra in that a) null —» identity: semrep(ϋ) = 1 where 0eSyn(NL) is the null complex and leSem(NL) is the identity object b) basic — > lexical: semι-ej(q(d(e),a)) = lexrej(a,e)eSem(N ) for any lexical term (a,e)eTrm(NL) and associated reduced term (a,d(e))eRdn(NL), where q(d(e),a) = ,s «/ 7w(a,d(e))eSyn(NL) is a basic complex c) composite —> composite: semreflcfcj Υ) = se7rø-e q)*se/wre (q')eSem(NL) for any complexes q,q'eSyn(NL)
tnsrep : Syn(NL) -» Tns(NL) is the tensor representation such that tnsrep(q) = q®i,emrep(q)eTns(NL) is the tens or ed complex associated with the syntactic term qeSyn(NL), and implicitly, with the semantic reference se/7ϊrep(q)eSem(NL)
finlrep: Sem(NL) →- Mod(XL) is the formal representation such that^m/>ep(x)eMod(XL) is the formal model associated with the semantic context xeSem(NL)
finlint: Tns(NL) -»• Exp(XL) is the formal interpretation such that «/z'«/(u)eExp(XL) is the formal expression associated with the tensored complex ueTns(NL)
modref i : Trm(XL) — » M for each MeMod(XL) is the model reference such that modrej [(y)eM is the model element referenced by the term veTrm(XL); if there is no actual reference for v in M then modref [(y) = OeM is the null object by default
extrep : Mod(XL) - Env(NL) is the external representation such that exzrejc(M)eEnv(NL) is the external operational environment associated with the internal formal model MeMod(XL)
extint : Trm(PL) — Trm(EL) is the external term interpretation relative to an operational environment EeEnv(NL) with an associated executable language EL such that extint£ (v)eTrm(EL) is the external term corresponding to the protocol term veTrm(PL)
exttrn^ : Exp(XL) - Exp(EL) is the external translation relative to an operational environment EeEnv(NL) with an associated executable language EL such that exttrnji (φ)eExp(EL) is the executable translate of the formal expression φeExp(XL)
envexσ : Exp(EL) — » E is the execution process for an operational environment EeEnv(NL) with an associated executable language EL such that envexc^(ξ)eE is the result of executing the formal expression ξeExp(EL) in E
Note: An execution process for an operational environment E = {Efc | keN} and executable language EL is defined with respect to an execution procedure envprcβ : Exp(EL)xN -> N such that for any executable expression ξeExp(EL) and operational state index keN , the image state EJΠΛ |= ξ (i.e. the model ESΠΛ satisfies the expression ξ ), where the image index J(k) = envprc£(ξ,,k) > k. In terms of a sequence of operations in E, any specific execution of an expression ξeExp(EL) is then simply the operational state envexc^(ζ) — envprcτβ(ξ,h)eE for some index neN.
pclint: Sem(NL) — » Trm(PL) is the protocol interpretation such that/>c/z>z/:(x)eTrm(PL) is the protocol term corresponding to the semantic object xeSem(NL)
pclcod: Mod(XL)xExp(XL) - Exp(PL) is the protocol encoding such that/?c coc?(M,φ)eExp(PL) is the protocol expression associated with the formal expression φeExp(XL) as interpreted with respect to the formal model MeMod(XL)
pclrep: Exp(PL) -> Env(NL) is the external protocol representation such that c/rep(X)eEnv(NL) is the external operational environment encoded into the protocol expression XeExp(PL)
pcltrn: Exp(PL) - Exp(EL) is the external protocol translation such that >c/zr«(X)eExp(EL) is the formal expression encoded into the protocol expression XeExp(PL) for execution in the external operational environment E =/?c/rep(X)eEnv(NL).
extref: Exp(PL)xTrm(PL) -> Env*(NL) = Env(NL) (UEnv(NL)) is the uniform external reference such that exfref(X,T) is the external element referenced by the term of metatype TeTyp(PL) in the protocol expression XeExp(PL); if there is no actual reference of this type then extref (X,T) = 0 is the null object by default
Note: The uniform range Env*(NL) = Env(NL) (Env(NL)) of extref, where UEnv(NL) = the combined set of elements from all environments EeEnv(NL) ), accommodates reference with respect to terms of metatype ENVeTyp(PL) as well as all other terms, since the reference exzrς/(X,ENV) is an operational environment for any protocol expression XeExp(PL), i.e. ex/re [X,ENV)eEnv(NL), whereas the referenced external element extreβX,Α C) for any other metatype ABCeTyp(PL) is an element of the operational environment exfre/(X,ENV), i.e. ABC ≠ ENV = exti-ef(X,ABC)eexfref(X,EWtf); therefore, the range of extref must be mixed between operational environments and elements of these environments. envrefg : Trm(PL) - E for each EeEnv(NL) is the operational environment reference such that envrefiβ(v)eM is the environment element referenced by the term veTrm(PL); if there is no actual reference for v in E then envrefyf ) = OeE is the null object by default
Put succinctly, METASCRIPT is an effective transformation mscript : Exp(NL) → Exp(PL) of natural language expressions into formal expressions in the abstract language PL associated with a universal protocol. Using a formal metasemantics over the object language XL associated with NL, this protocol is explicitly designed to accommodate further effective translations exttrπ^ : Exp(XL) - Exp(EL) for specific operational environments EeEnv(NL) with associated executable languages EL.
It should be noted that a particular protocol is utilized in certain sections of this description for the purposes of specificity and clarity. In this exemplary embodiment, the protocol is XMP (for external Media Protocol), which is designed and used as a universal transaction medium for diverse digital components in a networked environment. However, the invention and functionality of this system is not limited to any specific protocol.
SYSTEM PROCESSES:
Text Processing: Refer to module 3.1 of FIG. 3.
Text Input: Refer to submodule 3.1.1 of FIG. 3.
Any text is presented to the system as a string of ASCII characters. The specific method of text presentation is irrelevant. It may be voice-generated, entered via keyboard, optically scanned, etc. The system itself does not explicitly include a module for speech recognition, optical character recognition, etc., but any effective mechanism for presenting ASCII text to the system is sufficient for its successful operation.
The following text string represents a typical input:
Send Bob an email asking him if he is going to go to his appointment by himself. This example will be carried throughout the following discussion in order to illustrate the METASCRIPT process in some detail.
Text Parser: Refer to submodule 3.1.2 of FIG. 3.
The text parser partitions any string of ASCII characters into sequences of sequences of ASCII substrings, using embedded blank spaces as delimiters. These inner sequences of substrings represent potential expressions, and the substrings themselves represent potential words or punctuation marks. For terminological convenience, any such parsed substring is called a pretoken, and the partitioned sequence in which it occurs is called a preexpression.
Applied to the sample text string introduced above, the parser produces the following sequence of pretokens:
(send, Bob, an, email, asking, him, if, he, is, going, to, go, to, his, appointment, by, himself, .)
Syntactic Processing: Refer to module 3.2 of FIG. 3.
Type Association: Refer to submodule 3.2.1 of FIG. 3.
Type Assignment: Refer to submodule 3.2.1.1 of FIG. 3.
Each parsed pretoken is checked against the lexicon for recognition by the system. If apretoken is recognized, i.e. if that string is included in the lexicon as an actual token, then it is immediately associated with some type. This first order assignment of type to token is only tentative at this point in the process, since correct type association requires more than mere lexical recognition of text strings as legitimate tokens. Accordingly, these initially assigned types are considered to be "virtual" types.
Virtual type assignment on the sample preexpression parsed above yields the following list of token/type pairs (lexical terms) in the form (a,lextyp(0,a)) for each indicated token aeTok(NL) as shown in FIG. 4a. Since all pretokens listed there are recognized by the system as actual tokens, the parsed preexpression becomes an expression for further processing. However, the presence of ambiguous (virtual) types classifies this expression as "virtual". Promotion to an "actual" expression is deferred to the type contextuαlizαtion process.
Lexical Insertion: Refer to submodule 3.2.1.3 of FIG. 3.
If a pretoken is not recognized by the system, then the user is prompted for information concerning lexical type and reference which may be properly associated with that pretoken in order to form a lexical term appropriate for inclusion in the lexicon. Upon such inclusion, the pretoken becomes a system token. This is the primary mechanism, and the only one under direct user control, by which the system learns new vocabulary.
Type Contextualization: Refer to submodule 3.2.1.4 of FIG. 3. Second order type assignment uses the initial lexical type assignments to tokens in an expression as data for a contextual process by which actual types may be assigned to tokens depending on their syntactic roles relative to other tokens in that expression. For example, in the sentence
I want to go to the store. the word "to" is of ambiguous type because it appears in two different guises, once as the prefix of the infinitive "to go" and later as the preposition in the phrase "to the store". The system is able to discern such grammatical differences and assign correct types based on syntactic context.
This method of type reassignment through syntactic context alone represents the simplest, most direct form of disambiguation employed by the system. More subtle mechanisms for further disambiguation, such as local type reduction and semantic contextualization, are deployed later in the process.
In any case, final type association on the sample expression being processed yields the list shown in FIG. 4b of refined lexical terms in the form (a,lextyp(j,aj) for each indicated token aeTok(NL), where j>0 is the (perhaps alternative) index corresponding to the appropriate refined type. Note that all the ambiguous (virtual) lexical types have been replaced by unambiguous (actual) types. This type reassignment promotes a virtual expression to an actual expression suitable for further syntactic processing.
Term Resolution: Refer to submodule 3.2.2 of FIG. 3.
Term Correlation: Refer to submodule 3.2.2.1 of FIG. 3.
Indirect references by various syntactic elements such as pronouns must be correlated to direct references elsewhere in the text. This task is achieved through type matching in the context of appropriate syntactic configurations. For example, in the sentence
Inform Bob about the next meeting, and tell him that it will happen later than usual. the pronoun "him" naturally correlates with "Bob", and "it" with "meeting". The system establishes such correlations first by executing a simple type/reference matching such as (him - male person — » Bob) on antecedents, and then by evaluating such matches for probable fit according to context. For example, in the extended sentence
Inform Bob about the next meeting at the factory, and tell him that it will happen later than usual. there are two possible type/reference matches for "it", viz. (it — » some object - {meeting , factory}), but clearly, on the basis of object characteristics, the match (it -> some object - meeting) is a better fit than (it → some object -» factory) in the context of the phrase "... it will happen later than usual" since meetings happen more readily than factories do, as any capable scheme of lexical reference will indicate. It should be noted, of course, that this process must be applied to a text as a whole, not just to individual expressions, so that indirect references across multiple expressions may be correlated properly.
Accordingly, the following simple term correlations are made within the sample expression being processed:
5) (him, ppm) -» (Bob, pnm)
7 ) (he , ppm) -» (Bob, pnm)
13 ) (his , psm) → (Bob' s , psm)
16) (himself , prm) — > (Bob, pnm)
Term Reduction: Refer to submodule 3.2.2.2 of FIG. 3.
Proper syntactic dependencies between terms in an expression are established by means of a type reduction matrix. The dimensions of this matrix help to determine the level of syntactic disambiguation quickly achievable by the system. A 2-dimensional matrix, which maps pairs of tokens into their relative reduction ordering, is minimal. Higher dimensional reduction matrices, which map longer sequences of tokens into their relative reduction orderings, are increasingly effective, but more costly to implement in terms of memory requirements and processing speed. An optimal number of reduction dimensions, of course, depends critically on a complex combination of implementation constraints and performance criteria. Whatever the number of dimensions used, however, the system is designed to establish correct syntactic relationships on a relatively global scale (at least over an entire expression) simply by executing local term reductions in the proper order.
For example, using a minimal matrix on the sample expression being processed, the local term reduction sequence is constructed as shown in FIG. 5.
Term Inversion: Refer to submodule 3.2.2.3 of FIG. 3.
Proper chains of syntactic dependencies among tokens in a sentence, and the resultant dependencies between those chains, are constructed by means of an effective inversion of the term reduction process. These chains are then used to generate branches of the syntactic tree which represents the underlying syntactic structure of the expression being processed.
On the sample reduction sequence just constructed, term inversion produces the maximal chains shown in FIG. 6a. Note that term reduction has effected critical type modulations in some of the subordinate terms, viz. ob j— >obd (general object -> direct object), ob j→-obi (general object -> indirect object), ob j→obs (general object -» verb subject), and obj→obp (general object — > preposition object). These reduced types are critical data for accurate semantic processing.
Syntactic Representation: Refer to submodule 3.2.3 of FIG. 3.
Each expression for a language is represented at the fundamental syntactic structural level by a tree, i.e. by a finitely branching partial order of finite length having elements corresponding to lexical terms ordered by their associated type reductions. This tree has a canonical representation as a natural branching diagram, which in turn becomes represented as a (usually composite) term (syntactic complex) in an associated syntactic algebra. The branches of this tree correspond directly to the chains of syntactic dependencies established in the term resolution process.
For example, the syntactic tree representing the sample text being processed becomes structured as shown in FIG. 6b. Nodes of a syntactic tree and the syntactic order relations which hold between them are interpreted in terms of a many-sorted syntactic structure. The canonical language for this class of structures is based on the notion of a type/token complex to form basic terms, and the operation of term application on pairs of these complexes to form more complicated or composite terms. Each term, whether basic or composite, ultimately corresponds to a semantic object with certain characteristics, and these objects in their various configurations comprise the domains and internal relations of the syntactic structures considered.
More specifically, under METASCRIPT the fonnal correlate of an expression in a natural language is a term in the associated syntactic algebra which represents the effective translation of the expression into a form suitable for direct system interpretation. This algebra is based on an operation of term application which, given a translation of natural language expressions into algebraic terms, transforms syntactic dependencies at the level of natural language into specific formal relations at an algebraic level. Thus the syntactics for natural language becomes a matter of effective computation.
Each node (type/token pair) (d,a) in a syntactic tree t(p) = sytτtr e(p)eTre(NL) representing an expression peExp(NL) corresponds to a reduced term (token/type pair) (a,d)eRdn(NL), and is associated with a basic complex q(d,a)eSyn(NL) by means of the syntactic assignment syntrm : Rdn(NL) -» Syn(NL), i.e. q(d,a) = syntrm(a,d). The immediate syntactic dependence of another term (d',a') on (d,a) determined by term reduction is signified at the tree level as the order relation (d,a) L (d',a'). The algebraic operation of term application •[•] : Syn(NL)xSyn(NL) — » Syn(NL) on pairs of syntactic complexes then yields composite complexes of the form q"=q[q'] induced by these syntactic dependency relations; for example, in the particular case considered here involving the terms (d,a) and (d',a'), term application results in the composite complex q"((d,a),(d',a')) = q(d,a)[q'(d',a')] = synti-m(a,ά)[syntj"m(a' ,d')].
Similarly, a chain of syntactic dependencies yields iterated applications on increasingly composite complexes and the basic complex which represents the next link in the dependency chain, i.e. the relation ZQ Z\ L Z2 between syntactic terms zø,z],Z2eTrm(NL) yields the algebraic term
Figure imgf000018_0001
foπn ZQ L (z\ ,Ύ , whereby syntactic terms z\ and Z2 are both directly dependent on zυ (i.e. zrj L zi and zo L Z2 ), yields the algebraic term qo[qi][q2J = (qofalD.'K] ≠ <3θtt3lE<32]3? i-e term application is not an associative algebraic operation.
In any case, continued term applications of this sort, as explicitly induced by the dependency structure of the syntactic tree, thus yield an effective representation of any expression in a language NL as a (usually composite) term in the associated syntactic algebra Syn(NL). For example, the syntactic complex corresponding to the sample expression being processed is
(act, send) [ (obi, Bob) ] [ (obd, email) [ (adj,an) ] [ (ptc, asking) [ (ltm, if) [ (act, is) [ (ptc, going) [ (inf, to) [ (act, go) [ (prp, to) [ (obp, appointment) [ (adj , is) ] ] ] [ (prp,by) [ (obp, imself) ]]]]]]]]]
This algebraic form of the expression is critical for the language processing which follows; in particular, the syntactic representation of expressions afforded by algebraic term reduction provides an effective recursion structure for accurate semantic processing.
In summary, each expression peExp(NL) is represented as a syntactic tree t(p) = syntre(p) eTre(NL) which induces an associated syntactic complex q(t(p)) = trerep(t(p))eSyn(NL). A detailed definition of the intermediate syntactic algebraic representation trerep : Tre(NL) -> Syn(NL) is given by induction on syntactic dependence:
a) null dependence: trerep(d,a) = syntrm(a,d) for any trivial (single node) tree (d,a)eTre(NL) b) direct dependence: trerepit L t') = trerep(t)[trerep(t')] for any trees t, eTre(NL) c) multiple dependence: trerep(t L (t',t")) = (trerep{t [trerep(t,)T)[trerep(t")' for any trees t,t',t"eTre(NL)
Full syntactic representation is then simply the composition synrep = trerep° synfre : Exp(NL) — >• Syn(NL).
Semantic Processing: Refer to module 3.3 of FIG. 3. Semantic Representation: Refer to submodule 3.3.1 of FIG. 3.
The lexical reference map lexref: Trm(NL) - Ref(NL) forms the basis for the interpretation of terms in the semantic algebra Sem(NL) as objects constructed over the reference domain Ref(NL). Specifically, each lexical term (token/type pair) (a,e)eTrm(NL) is explicitly associated in the lexicon with a reference /exre/(a,e)eRef(NL) which instantiates the term in a given environment; in fact, no lexical term is properly defined until such a reference for it is specified, since this definition forms the principal link between a natural language term and its intended meanings. The basic objects of the semantic algebra are simply these lexical references, i.e. Ref(NL) c Sem(NL).
This first order notion of reference for lexical terms is then extended to more complex semantic terms by means of a semantic product * : Sem(NL) xSem(NL) -» Sem(NL) on objects which allows a proper definition of a semantic representation semrep : Syn(NL) — Sem(NL) on the entire algebra Syn(NL). By induction on the composition of syntactic complexes, the full definition becomes:
a) null: semrep(0) = 1 where 0eSyn(NL) is the null complex and leSem(NL) is the identity object b) basic: semref(q(d(e),a ) = /exre/(a,e)eSem(NL) for any lexical term (a,e)eTrm(NL) and associated reduced term (a,d(e))eRdn(NL), where q(d(e),a) = syntrm(a,ά(e)) eSyn(NU) is a basic complex c) composite:
Figure imgf000019_0001
- se/τzre/(q)*.semre/(q')eSem(NL) for any complexes q,q'eSyn(NL)
By this definition, semrep is clearly a homomorphism from the syntactic algebra into the semantic object algebra. For example, given the dependency structure zo L z\ L Z2 on certain lexical terms z; = (aj,e;)eTrm(NL) with associated syntactic complexes qj=q(zj)eSyn(NL) for j=0,l,2, there is a composite complex q3=qo-α,l[c12]]eSyn(NL) constructed by iterated syntactic term application with semantic reference
semrep(q^) = ,se«zre/j(qo[qi[q2]])
= semrep(qo)*semrep([q \ [q2]])
= semrep(qQ)*(semrep(q \)*semrep(q2J)
= lexrej(ao,ecj)*(lexref(aι ,e i )*lexreβ&2,e2))
in Sem(NL) derived from the individual lexical references /exre/(aj,ej)eRef(NL) c Sem(NL).
The definition of the semantic product * : Sem(NL) xSem(NL) -» Sem(NL) is based on a notion similar to class inheritance from the formal practice of object-oriented design (OOD). Specifically, each reference in Sem(NL) is instantiated as an object with certain characteristics, and the product x*y of two objects x,j eSem(NL) is simply the object z Sem(NL) generated in the inheritance lattice as the minima] common upper bound of x and y, withy dominant over x on issues of conjunctive consistency. Note that by virtue of consistency dominance this product need not be commutative, i.e. it is not necessarily the case thatx*y =y*x for all , eSem(NL); similarly, this product need not be associative, i.e. it is not necessarily the case that x*(y*z) = (x*y)*z for all xj/,zeSem(NL). However, it is idempotent, i.e. x*x = x for all xeSem(NL).
It is primarily through this algebraic generation of composite objects that the system gains its complexity; moreover, the addition of terms to the lexicon which have these composite objects as their direct lexical references permits unlimited efficiency and sophistication of machine comprehensible natural language usage.
The semantic tensor algebra Tns(NL) = Syn(NL)®Sem(NL) defined over the representation semrep : Syn(NL) → Sem(NL) is composed of tensored correlations q®semrep(q) of syntactic terms qeSyn(NL) and associated semantic representations semrep(q)& era( L), which form the direct computational basis for formal interpretations of natural language expressions in terms of a fundamental transaction paradigm. Specifically, the syntactic relationships encoded in a syntactic complex q(p)eSyn(NL) derived from an expression peExp(NL) permit an exact internal structuring of the associated semantic complex semrep(q(p))eSem(NL).
Formal Representation: Refer to submodule 3.3.2 of FIG. 3.
Associated with any expression peExp(NL) is the semantic context c(p) = cσ?zzx/(q(p),s(p))eSem(NL) where q(p) = s rø*ep(p)eSyn(NL) is the syntactic complex representing p and s(p)eTxt(NL) is the original input text from which p is derived through text parsing and type association. This semantic context c(p) is formally represented as an internal structure M(p) = finlrep(c(p))eMod(XL), which serves as an abstract model of the operational environment in which p is to be executed after translation into a suitably effective form. As formal representatives in this modeling capacity, these internal structures for XL form the critical link between NL and any executable language EL for an external environment E. The semantic context c(p) is constructed as an object in the algebra Sem(NL) as follows: After text parsing and type association, the text s(p) is represented as a sequence (pυ,...,pn)eSe#(Exp(NL)) of expressions p; eExp(NL) for some n > 0, where p = pj(p) for some index j(p)e {0,...,n}. Note that s(pj) = s(p) for 0 ≤ j ≤ n. The definition of the representation cσ7zto(q(pj),s(p))eSem(NL) proceeds by induction on the indices je {0,...,n} where q(p;)eSyn(NL) is the syntactic complex representing pj. Entering into this definition is the metaterm operator pcltrm : Syn(NL)xTyp(PL) -» Syn(NL) associated with the protocol language PL as described in the following section. Briefly, for any syntactic complex qeSyn(NL) and metatype ABCeTyp(PL), the syntactic term 7Jc//77τz(q,ABC)eSyn(NL) is the leading subterm of q of type ABC; in particular, subterms /?c/rr7«(q,ENV)eSyn(NL) of metatype ENVeTyp(PL) (indicating "environment") play a significant role.
Specifically, the inductive definition of contxt : Syn(NL)xTxt(NL) -> Sem(NL) is as follows:
a) j = 0: co7z xz*(q(po),s(p)) = se7?z7-ep(pc/zr77z(q(po),ENV))eSem(NL) b) j = k+1 for k < n: contxt(q(p\i+\),s(p)) = contxt(q(p i),s(p))*semrep(pcltrm(q(p]i+ j ),ENV)) e Sem(NL)
Then c(p) = coτztx/(q(p),s(p)) = co7zzx/(q(pj(p)),s(p))eSem(NL), and M(p)
Figure imgf000021_0001
by means of the formal representation finlrep : Sem(NL) → Mod(XL). The noncommutativity of the semantic product is critical in this definition.
Formal Interpretation: Refer to submodule 3.3.3 of FIG. 3.
The formal term interpretation trmint : Sem(NL)xMod(XL) -» Trm(XL) establishes the syntactic basis for the formalization of NL, since expressions φ(p)eExp(XL) constructed as formal interpretations of natural expressions peExp(NL) are built from terms in Trm(XL) associated through trmint with appropriate objects in Sem(NL). In some extremely approximate sense, φ(p) = q(p)® r77Z77zτ:(x(p)) where q(p) = synrep (p)eSyn(NL) is the syntactic complex representing p and x(p) = 5eτ?z7-ep(q(p))eSem(NL) is the semantic complex representing q(p).
The formal semantics, or metasemantics, by which these formal interpretations are constructed is called MEAO (for Modified Environment Action Object). It is based on a fundamental transaction paradigm for which the elementary operation is a mapping /: → 5 on structures
Figure imgf000021_0002
in an environment M. Terms of an object language for this semantics reflect this fundamental orientation, and a basic statement in any such language is simply an assignment of the form "xβ
Figure imgf000021_0003
(x )" where xjeA, xge ?, and fjf A → B. More complicated expressions are constructed from these basic (or atomic) statements by means of the usual formal propositional connectives (negation, conjunction, disjunction, and implication) and quantification (existential and universal). Of course, any of the individuals, objects, functions, and environments which serve as interpreted elements for such a language may be arbitrarily articulated, thus extremely complex despite the apparent simplicity of this formal syntax; hence the notion of "modification" (the M in MEAO) plays a significant role. As previously indicated, the proper formal interpretation of an expression peExp(NL) relative to a semantics for the object language XL is induced by a map trmint : Sem(NL) — » Trm(XL). However, the actual computation of MEAO elements from the syntactic and semantic apparatus of NL is performed in terms of a formal protocol which implements this metasemantics in an effective manner.
Specifically, there is a metaterm operator pcltrm : Syn(NL)xTyp(PL) -> Syn(NL) for the protocol language PL associated with XL, which is built from conditional sequences (in essence, executable routines) of syntactic subterm assignments yielded by the standard subterm designator subtrm : Syn(NL)xRtp(NL) — » Syn(NL). Distinguished semantic types TeTyp(PL) determine certain syntactic subterms q(p)χ
Figure imgf000022_0001
of the syntactic complex q(p)eSyn(NL), and these syntactic subterms yield semantic representations x(p)χ = se77zre/?(q(p)χ)eSem(NL).
It is at this point that the formal term interpretation trmint : Sem(NL) -» Trm(XL) comes into play as the basis of the formal semantics for XL. From the perspective of an effective metasemantics for XL, however, this term interpretation is derived from a higher lev e\ protocol interpretation pclint : Sem(NL) —> Trm(PL) by composition with an intermediate formal interpretation finltrm : Trm(PL) - Trm(XL) as trmint =finltrm° pclint : Sem(NL) - Tnn(PL) -» Trm(XL)
It is in this precise sense, coupled with the fact that the internal model is also interpreted as an element for PL corresponding to a term V(p)ENV
Figure imgf000022_0002
relative to the metatype ENVeTyp(PL) (indicating "environment"), that PL constitutes an effective formal metalanguage for XL, whereas NL is an informal metalanguage for XL.
In any case, the semantic representations x(p)χeSem(NL) are interpreted as formal terms v(p)χ = zr77zz72 (x(p)χ) =^7z/z7772°pc/z 7 (x(p)χ)eTrm(XL). These terms then designate formal elements h(p)χ = modref [( ,)(v(p) )eM(p) by means of the model reference map modreffy[(p) '■ Tnn(XL) -> M(p). Finally, these elements are structured by the syntactic complex q(p) into the formal configuration K(p)cM(p), which is exactly described by the formal expression φ(p)
Figure imgf000022_0003
where u(p) = q(p)®5e/7zrep(q(p))eTns(NL) is the semantic tensor complex representing p. More precisely, K(p) is the minimal substructure of M(p) satisfying φ(p), i.e. M(p) 2 K(p) |= φ(p).
Again, in the simplest case conforming to MEAO semantics, the structure K(p) consists of
a) an object^ = 7?zodre/vl(p)(v(p)DMN) e Obj(M(p)) for a term v(p)DMNeTrm(XL) of metatype
D NeTyp(PL) b) an object B = modref (^)(v(p)m^) e Obj(M(p)) for a term v(p)RNGeTrm(XL) of metatype RNGeTyp(PL) c) a ma fβ = røo* e/M(p)(v(P Ap) e Map(M(p)) for a term v(p)MApeTrm(XL) of metatype APeTyp(PL) d) an element = 7?zo re/ y[(p)(v(p)oBj) e A for a term v(p)ARGeTrm(XL) of metatype ARGeTyp(PL) e) an element xβ
Figure imgf000023_0001
e B
while the formal description φ(p) is simply the statement 'xβ
Figure imgf000023_0002
(x )' .
As the formal description of an abstract configuration K(p) in the internal model M(p), the expression φ(p) is not an executable translate of the natural expression p; instead, it is a formal interpretation of the precise conditions, relative to an abstract model of an operational environment, upon which an effective executable form of p may be constructed. The ultimate transition to an appropriate external operational environment is properly accomplished by means of the metasemantic protocol. In essence, the construction of a metaformal expression X(p)eExp(PL) as a fully effective translation of an expression peExp(NL) is simply a machine interpretable codification of the satisfaction relation M(p) |= φ(p).
Formally, the expression X(p) =j!?c/cσ< (M(p),φ(p))eExp(PL) is constructed by means of the protocol encoding pclcod : Mod(XL)xExp(XL) -» Exp(PL) based on the protocol interpretation /?c/z'7zt : Sem(NL) -> Trm(PL) introduced above. From this metaformal perspective, the internal model M(p)eMod(XL) appropriate for p is referenced by the tenn V(p)ENV =/>c/z«t(c(p),ENV)eTrm(PL), and semantic objects x(p)χ = semrep(q(p) )eSem NL), which represent distinguished subterms q(p)χ = j5c/ 77?2(q(p),T)eSyn(NL) of q(p) = synrep(p) Syn(NL) corresponding to metatypes TeTyp(PL), are interpreted as terms w(p)χ = zj>c//τ2r;(x(p)χ)eTrm(PL). For the purpose of protocol consistency, it is significant to note that the formal terms from which (p(p)eExp(XL) is assembled are simply the interpretations v(p)χ
Figure imgf000023_0003
When PL is XMPL, the formal language associated with the universal protocol XMP, this metasyntactic construction is straightforward. Indeed, the syntax of XMPL is explicitly designed to accommodate MEAO semantics. In general, what is specified by any XMPL expression is an operational environment, a n-ansaction mapping, a mapping domain, a mapping range, and a mapping argument. Refinements to any of these elements are indicated by appropriate nestings of subelements.
Again, in the simplest case conforming to MEAO semantics, X(p)eExp(XMPL) is the protocol statement
<XMP|<ENV|V(p) |ENVX AP|f (p) | APXD N | A(p) | DMNXRNG | B (p) | RNG> <ARG I x (p) I ARG> I XMP> where <ABC | xy z | ABO is a term of protocol type ABC and content xy z. The basic types occurring here are
XMP = transaction protocol ENV = operational environment MAP = transaction mapping DMN = mapping domain RNG = mapping range ARG = mapping argument For example, the XMPL statement which results from a metaformal interpretation of the sample natural language expression being processed is
<XMP I <ENVI email | ENVXMAP | send | MAPXDMN | <ADR| userghere .net |ADR> | DMN> <RNG I <ADR I bobøthere . net |ADR> | RNG>
<ARG I <MSG I <SB | <LST | Appointment | LST> | SBT>
<TX I <LS IAre you going to go to your appointment by yourself? | LST> | XT> | MSG> |ARG> | XMP> using the additional protocol types
ADR = object address
MSG = message object
SBT = message subject
TXT = message text
LST = literal string
Note that the actual message text field of the XMPL statement consists of a question with the appropriate substitutions of 2n" person pronouns for the original 3r" person forms referring to the recipient "Bob". These transformations are computed as natural consequences of the syntactic relations coded into the algebraic form of the original natural language expression. In fact, all terms in the XMPL translation are computed similarly, i.e. as significant correlates of objects in the semantic complex structured by the syntactic dependencies indicated by the original natural language expression.
This metaformal syntactic scheme provides an effective universal template for abstract computations, and most significantly, for further translations into exact forms which are executable in specific operational environments.
External Processing: Refer to module 3.4 of FIG. 3.
External Representation: Refer to submodule 3.4.1 of FIG. 3.
Contact of the system with specific operational environments is made by means of an external representation extrep : Mod(XL) - Env(NL) which associates appropriate external operational environments with internal formal models. In fact, these internal structures are explicitly designed as abstract models of external environments in order to accommodate this representation; accordingly, the external representation extrep may be viewed as the inverse of an internal representation intrep : Env(NL) -» Mod(XL) arising from a prior analysis of those operational environments which are relevant to the system.
For environments EeEnv(NL) which have their own executable language EL, this representation facilitates a translation between XL and EL by providing the semantic conditions which determine the mapping from terms of XL to the corresponding terms in EL. It is by means of this mediation in terms of its object language XL that the natural language NL, which is appropriate for informal negotiations in a wide variety of environments, becomes a metalanguage for an executable language EL, and therefore provides a basis for a meaningful operational semantics. External Interpretation: Refer to submodule 3.4.2 of FIG. 3.
For a specific application in some external operational environment EeEnv(NL) with an associated executable language EL, i.e. an application relative to a structure E for which there is an executable interpretation expinPβ : Exp(EL) — > E, expressions of the object language XL are translated into expressions of EL by means of an external translation exttrir : Exp(XL) — » Exp(EL). The commutative diagram shown in FIG. 7a illustrates the respective roles of the syntactic and semantic algebras for NL and the formal apparatus of XL in determining this translation.
The role of the metasemaritic protocol indicated in this diagram is implicit. As an effective implementation of a formal interpretation scheme between NL and XL, it controls a critical aspect of the internal semantics of the system. As an effective translation medium between the internal formal structures of XL and the executable structures of EL associated with any operational environment E for NL, this protocol controls the external semantics of the system. The diagram shown in FIG. 7b illustrates the scope of these dual functions by highlighting the influence of the protocol on the system.
In particular, relative to an external operational environment E = ex 7-ep(M)eEnv(NL) modeled by an internal structure MeMod(XL), the formal translation exttrrrg : Exp(XL) -» Exp(EL) is implemented by means of the protocol encoding pclcod : Mod(XL)xExp(XL) — > Exp(PL) and translation pcltrn : Exp(PL) -» Exp(EL) as the composition exttrn^(ψ) = pcltrn(pclcod(M,φ)) for any formal expression φeExp(XL); moreover, the appropriate operational environment is determined by means of the protocol representation pclrep : Exp(PL) — > Env(NL) as E = pclrep(pclcod(M,φ)). In short, all external transactions are mediated by the protocol.
Indeed, the universal transaction protocol XMP easily accommodates externalization since its associated formal language XMPL naturally translates into executable languages such as SQL (Standard Query Language) and SMTPL (the language of the mail protocol SMTP), and even executable extensions of XML (extensible Markup Language), in the manner indicated above, where EL is any of these executable languages. As such, XMPL forms a natural bridge between the internal semantics of the system and the external semantics of the environments in which it operates. The control structure of XMP then makes these external translations effective by finally facilitating in appropriate operational environments the execution of commands originally issued by the user in natural language.
More precisely, as given in the simple machine interpretable fonn
<XMP I <ENV I V I ENVXMAP | f | MAPXDMN | A | DMNXRNG | B | RNGXARG | X | ARG> | XMP> introduced above, a basic protocol statement XeExp(XMPL) encodes the instruction to execute in environment E the operation fj? with domain Aj? and range Bg on the argument xjf; , where the external elements
a) E = extrej(X,EΗV) =pclrep(X) e Env(NL) b) fj3 = extref(X,MΑF) = modref^(extint^(f ) e Map(E) c) AE = extrej(X,OMN) = modref^extint^A)) e Obj(E) d) Bβ = extreβX,RNG) = modref£(extintjβ(B)) e Obj(E) e) £ = extref(X,ARG) =
Figure imgf000026_0001
e AJJ
are properly interpreted by means of the uniform external reference extref: Exp(PL)xTrm(PL) — » Env*(NL), or alternatively, initialized by the protocol representation pclrep : Exp(PL) - Env(NL), and then determined locally by the composition of the model reference modrefg : Trm(EL) — »E with the external term interpretation extintβ : Trm(PL) -» Trm(EL), both of which are defined relative to the operational environment EeEnv(NL). The actual external processing of this instruction is finally accomplished by application of the execution process envexcg : Exp(EL) — » E to the formal translate ξ = jr7c 7z(X)eExp(EL), which is the executable expression encoded in X and constructed over the terms extintgif), extint^(A), extin β(B), ex/zrar£(x)eTrrn(EL). The result is a new state eτzvexcE(ξ)eE.
For example, when interpreted according to its explicit formal specification as an email instruction to be executed under the control of the external protocol SMTP, the XMPL statement resulting from the sample expression being processed yields the result
To : Bob
From: <user>
Subject: Appointment
Text: Are you going to go to your appointment by yourself? as a properly formatted message in the file bob@there.net, where <user> is the local (recipient's) identification of the sender (user).
APPLICATIONS AND TECHNOLOGY:
METASCRIPT is currently implemented as a translation from a natural language (English) into the formal language (XMPL) associated with a universal transaction protocol (XMP: external Media Protocol). In turn, this formal language is suitable for interpretation by digital components in external operational environments into executable machine instructions. Thus METASCRIPT allows a human user to communicate naturally in an effective manner with (and through) any programmable device, hence networked congigurations of such devices, compatible with the protocol XMP. In this capacity, METASCRIPT is a natural language interface to any sufficiently capable digital environment, whether it be a single device such as a computer, a cellular phone, a PDA (Personal Digital Assistant), a kitchen appliance, an automobile, or a whole network of such devices such as a local intranet or the global Internet. As a complete NLP system, the combined technologies of METASCRIPT and XMP enable a seamless integration of all participants, human and digital alike, into an effective ubiquitous network.
The fundamental algorithm upon which METASCRIPT is based employs a reduction to formal syntactic structures over terms defined in an extensible lexicon. This term reduction incorporates both syntactic type and semantic context to achieve an effective formal representation and interpretation of the meaning conveyed by any natural language expression. Extensibility of the lexicon under specific user direction provides the capacity for the system to expand its knowledge of vocabulary and usage, and consequently, offers an effective mechanism under user control for establishing definite incremental enhancements to the system's linguistic capabilities, hence substantially increasing the system's familiarity with (and competence in) particular operational environments.
In addition, the system automatically gains functional complexity through its object-oriented semantics, whereby the addition of formal terms having composite objects generated by algebraic representations of natural linguistic terms as direct references permits unlimited efficiency and sophistication of machine comprehensible natural language usage. Put simply, the system learns as it goes. Moreover, any desired level of syntactic disambiguation is attainable by increasing the local dimensionality of the underlying reduction matrix, though this feature is part of the underlying algorithm, and therefore independent of user modulation.
Finally, while the invention has been particularly shown and described with reference to preferred embodiments thereof, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention.

Claims

CLAIMS:
1) A natural language processing apparatus for translating natural language into a formal language executable on a programmable device, said system comprising, a) memory for storing data; b) a data processor; c) an input device for presenting natural language text to said system; d) a text parser for partitioning said text into a sequence pf sequences of string of characters or pretokens; e) a lexicon for storing lexical terms as token associated with lexical type and reference data; f) a lexical type assignment process for assigning lexical types to pretokens by comparison to terms in the lexicon; g) a lexical insertion processor for inserting terms into the lexicon under specific control; h) a control processor for invoking lexical insertions under the condition that a pretoken is not recognized as a lexical token; i) a type contextualization processor by which refined lexical types may be reassigned to tokens depending on syntactic context; j) a type reduction matrix; k) a term reduction processor which uses said type reduction matrix to determine proper syntactic dependencies between tokens in a sentence;
1) a term inversion processor for constructing chains of syntactic dependencies among lexical terms in an expression and for determining the proper dependencies between those chains; m) a syntactic tree generation processor for constructing syntactic trees representing the syntactic structure of each processed expression; n) a syntactic algebra comprising syntactic terms formally representing processed expressions; o) a syntactic representation processor for constructing syntactic terms to represent the formal syntactic structure of processed expressions; p) a semantic algebra comprising semantic objects as formal references of appropriate terms in the syntactic algebra; q) a semantic representation processor for associating internal semantic object references with terms in the syntactic algebra; r) a semantic tensor algebra comprising correlated pairs of syntactic algebraic terms and their semantic object representations; s) a formal representation processor for associating appropriate internal fonnal models with terms in the semantic tensor algebra; t) a formal interpretation processor for transforming terms in the syntactic algebra into equivalent expressions in an internal formal language; u) an external representation processor for associating external operational environments with internal formal models; v) an external interpretation processor for translating expressions in an internal in an internal formal language into equivalent formal expressions executable into appropriate external operational environments.
2) A method for translating natural language into a formal language executable on a programmable device, said method comprising the steps of: a) receiving natural language text; b) parsing said text into a sequence of sequences of pretokens; c) recognizing pretokens as tokens in the lexicon; d) inserting new terms into the lexicon under specific control; e) assigning types to pretokens to form lexical terms for further syntactic processing; f) reassigning lexical types to tokens based on syntactic context; g) correlating terms occurring in a set of expressions in order to replace indirect references by appropriate direct references; h) establishing syntactic dependencies between terms in an expression through a process of term reduction; i) constructing chains of syntactic dependencies and determining dependencies between those chains, by a process of term inversion; j) generating syntactic trees which represent the syntactic structures of said processed expressions; k) representing said processed expressions as terms in a syntactic algebra;
1) representing terms in the syntactic algebra as objects in the semantic algebra; m) combining objects in the semantic algebra by means of a semantic product on pairs of semantic objects to form more complex semantic objects; n) representing correlated syntactic algebraic terms and semantic objects as terms in a semantic tensor algebra; o) representing terms in the semantic tensor algebra as internal fonnal models; p) transforming terms in the syntactic algebra into equivalent expressions in an internal formal language; q) associating external operation environments with internal formal models; and r) translating expressions of the internal formal language into equivalent formal expressions executable in an external operational environment.
3) In a natural language processing apparatus for translating natural language into a formal language executable on a programmable device, wherein said system includes processing means; input means for presenting natural language text to said system; a lexicon of terms; a text parser which partitions expressions into sequences of sequences of pretokens; a type assignment process for assigning syntactic types to pretokens by comparison to lexical terms in the lexicon and determining their status as tokens; a type contextualization process for reassigning lexical types to tokens based on syntactic context, a tenn correlation process for correlating terms occurring in a set of expressions in order to replace indirect references by direct references, said system comprising a) a type reduction matrix; b) a term reduction processor that uses the type reduction matrix to determine proper syntactic dependencies between tokens in an expression; c) a term inversion processor for constructing chains of syntactic dependencies among lexical terms in an expression and for determining the proper dependencies between those chains; d) a syntactic tree generation processor for constructing syntactic trees representing the syntactic structures of expressions; e) a syntactic algebra comprising syntactic terms formally representing processed expressions; f) a syntactic representation processor for constructing syntactic algebraic terms representing processed expressions; g) a semantic object algebra comprising semantic objects as internal references of terms in the syntactic algebra; h) a semantic product processor by which objects in the in the semantic object algebra are combined to form more complex semantic objects; i) a semantic representation processor by which internal semantic algebraic objects representing terms in the syntactic algebra are constructed; j) a semantic tensor algebra comprising correlated syntactic terms and semantic objects; k) a formal representation processor by which internal formal models are associated with terms in the semantic tensor algebra;
1) a formal interpretation processor by which syntactic algebraic terms are transformed into equivalent expressions in an internal fonnal language; m) a semantic product processor by which objects in the semantic algebra are combined to form more complex semantic objects; n) an external representation processor by which external operational environments are associated with internal formal models; and o) an external interpretation processor by which expressions in an internal formal language are translated into equivalent formal expressions executable in an external environment;
4) A software system for translating natural language into a formal language executable on a programmable device, wherein said system includes processing means; input means for presenting natural language text to said system; a lexicon of terms; a text parser which partitions natural language texts into sequences of sequences of pretokens; a type assignment process for assigning syntactic types to pretokens by comparison to lexical terms in the lexicon and determining their status as tokens; a type contextualization process for reassigning lexical types to tokens based on syntactic context; a term correlation process for correlating terms occurring in a set of expressions in order to replace indirect references by direct references, a) a type reduction matrix; b) a tenn reduction process which uses the type reduction matrix to determine proper syntactic dependencies between tokens in an expression; c) a term inversion process for constructing chains of syntactic dependencies among lexical terms in an expression and for determining the proper dependencies between those chains; d) a syntactic tree generation process by which syntactic trees representing the syntactic structures of expressions are constructed; e) a syntactic algebra comprising syntactic terms formally representing processed expressions; f) a syntactic representation process by which syntactic algebraic terms representing processed expressions are constructed; g) a semantic object algebra comprising semantic objects as internal references of term in the syntactic algebra; h) a semantic object algebra comprising semantic objects as formal references of terras in the syntactic algebra; i) a semantic representation process by which internal semantic algebraic objects representing appropriate tenns in the syntactic algebra are constructed; j) a semantic product process by which objects in the semantic algebra are combined to form more complex semantic objects; k) a formal representation process by which internal formal models object references are associated with terms in the semantic tensor algebra; j) a formal interpretation process by which syntactic algebraic terms are transformed into equivalent expressions in an internal formal language;
1) an external representation process by which appropriate external operation environments are associated with internal formal models; and
1) an external interpretation process by which expressions in an internal formal language are translated into equivalent formal expressions executable in an external operational environment.
5) A software system for a data processing device used in translating natural language into executable expressions in a formal language, wherein said data processing device includes a data processor and memory; input means for presenting natural language text to said system; a lexicon of terms; a text parser which partitions natural language texts into sequences of sequences of pretokens; a type assignment processor for assigning syntactic types to pretokens by comparison to lexical terms in the lexicon and determining their status as tokens; a type contextualization processor for reassigning lexical types to tokens based on syntactic context; a term correlation processor for correlating terms occurring in a set of expressions in order to replace indirect references by direct references; said software system comprising, a) a type reduction matrix for processing said expressions; b) a term reduction processor that uses the type reduction matrix to determine proper syntactic dependencies between tokens in an expression; c) a term inversion processor for constructing chains of syntactic dependencies among lexical terms in an expression and for determining the proper dependencies between those chains; d) a syntactic tree generation processor by which syntactic trees representing the syntactic structures of expressions are constructed; e) a syntactic algebra comprising syntactic terms formally representing said processed expressions; f) a syntactic representation processor by means of which syntactic algebraic terms representing processed expressions are constructed; g) a semantic object algebra comprising semantic objects as internal references of terms in the syntactic algebra; h) a semantic representation processor by which internal semantic algebraic objects representing terms in the syntactic algebra are constructed; i) a semantic product processor by which objects in the semantic algebra are combined to form more complex semantic objects; j) a formal representation processor by which internal formal models are associated with terms in the semantic tensor algebra; k) a formal interpretation processor by which syntactic algebraic terms are transformed into equivalent expressions in an internal formal language;
1) an external representation processor by which external operational environments are associated with internal formal models; and m) an external interpretation processor by which expressions in an internal formal language are translated into equivalent formal expressions executable in an external operational environment.
6) A system as in claim 5 further including a protocol by means of which a) selected ones of said internal formal models are associated with terms in said semantic tensor algebra; b) syntactic algebraic terms are transformed into equivalent expressions in the internal formal language; c) selected external operational environments are associated with selected formal models; and d) expressions in the internal formal language are translated into equivalent formal expressions executable in an external operational environment.
7) A system as in claim 5 further comprising a) a lexical insertion processor for inserting lexical terms into the lexicon under user control whereby said lexicon can be expanded and refined; and b) a controller for invoking lexical insertions under the condition that a pretoken is not recognized as a lexical token.
8) A system as in claim 7 further including a process control for inserting external lexical information to said lexicon to enable the system to learn new lexical information including vocabulary and associated lexical type and reference relations.
9) A data processing system for translating a natural language into a language executable as a formal machine language comprising, in combination, a) input devices for inputting a natural language text to said system; b) text processing components for providing an output comprising a sequence of preexpressions based on said text; c) a syntactic processing component receiving said preexpressions and providing a sequence of syntactic complexes; d) semantic processing components for receiving said sequence of syntactic complexes and providing a sequences of formal expressions; and e) external processing components for providing a sequence of executable expressions to an external operational environment based on said formal expressions.
10. A method of translating a natural language into a language executable as a formal or machine language comprising the steps of, a) inputting a natural language text to a data processing system; b) providing an output comprising a sequence of preexpressions based on said text; c) receiving said preexpressions and providing a sequence of syntactic complexes; d) receiving said sequence of syntactic complexes and providing a sequences of formal expressions; and e) providing a sequence of executable expressions as an external operational structure based on siad formal expressions.
PCT/GB2002/002742 2001-06-18 2002-06-12 Computer system with natural language to machine language translator WO2002103555A2 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP02732949A EP1397753A2 (en) 2001-06-18 2002-06-12 Computer system with natural language to machine language translator
AU2002304439A AU2002304439A1 (en) 2001-06-18 2002-06-12 Computer system with natural language to machine language translator

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US09/883,693 2001-06-18
US09/883,693 US7085708B2 (en) 2000-09-23 2001-06-18 Computer system with natural language to machine language translator

Publications (2)

Publication Number Publication Date
WO2002103555A2 true WO2002103555A2 (en) 2002-12-27
WO2002103555A3 WO2002103555A3 (en) 2003-11-06

Family

ID=25383137

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/GB2002/002742 WO2002103555A2 (en) 2001-06-18 2002-06-12 Computer system with natural language to machine language translator

Country Status (4)

Country Link
US (1) US7085708B2 (en)
EP (1) EP1397753A2 (en)
AU (1) AU2002304439A1 (en)
WO (1) WO2002103555A2 (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2005041033A2 (en) * 2003-10-23 2005-05-06 Honeywell International Inc. Method and apparatus for a hierarchical object model-based constrained language interpreter-parser
CN102603963A (en) * 2012-03-27 2012-07-25 陕西科技大学 Preparation method of environmentally-friendly polyacrylic ester pigment printing adhesive
CN102807648A (en) * 2012-09-05 2012-12-05 陕西科技大学 Method for preparing high-elasticity adhesive for fabric by adopting nuclear shell emulsion polymerization method
CN102911308A (en) * 2012-11-19 2013-02-06 陕西科技大学 Method for preparing fluorine contained polyacrylate/dual-sized nano SiO2 composite emulsion
CN106381709A (en) * 2016-09-05 2017-02-08 南通纺织丝绸产业技术研究院 Super-hydrophobic and anti-ultraviolet finishing agent used for textiles, and preparation method and application thereof
EP3685284A4 (en) * 2017-09-22 2021-06-16 Intuit Inc. Lean parsing: a natural language processing system and method for parsing domain-specific languages
US11222266B2 (en) 2016-07-15 2022-01-11 Intuit Inc. System and method for automatic learning of functions
US11520975B2 (en) 2016-07-15 2022-12-06 Intuit Inc. Lean parsing: a natural language processing system and method for parsing domain-specific languages
US11663677B2 (en) 2016-07-15 2023-05-30 Intuit Inc. System and method for automatically generating calculations for fields in compliance forms
US11687721B2 (en) 2019-05-23 2023-06-27 Intuit Inc. System and method for recognizing domain specific named entities using domain specific word embeddings
WO2023154392A1 (en) * 2022-02-14 2023-08-17 Google Llc Conversation graph navigation with language model
US11783128B2 (en) 2020-02-19 2023-10-10 Intuit Inc. Financial document text conversion to computer readable operations

Families Citing this family (137)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8214196B2 (en) 2001-07-03 2012-07-03 University Of Southern California Syntax-based statistical translation model
WO2003060660A2 (en) * 2002-01-10 2003-07-24 Exobrain, Inc. Meaning processor
US7343596B1 (en) * 2002-03-19 2008-03-11 Dloo, Incorporated Method and system for creating self-assembling components
WO2004001623A2 (en) 2002-03-26 2003-12-31 University Of Southern California Constructing a translation lexicon from comparable, non-parallel corpora
US7117487B2 (en) * 2002-05-10 2006-10-03 Microsoft Corporation Structural equivalence of expressions containing processes and queries
US20030212761A1 (en) * 2002-05-10 2003-11-13 Microsoft Corporation Process kernel
US7398209B2 (en) 2002-06-03 2008-07-08 Voicebox Technologies, Inc. Systems and methods for responding to natural language speech utterance
US7502730B2 (en) * 2002-06-14 2009-03-10 Microsoft Corporation Method and apparatus for federated understanding
US7693720B2 (en) * 2002-07-15 2010-04-06 Voicebox Technologies, Inc. Mobile systems and methods for responding to natural language speech utterance
US8321235B2 (en) * 2002-11-27 2012-11-27 Hewlett-Packard Development Company, L.P. Validating an electronic transaction
US20040122653A1 (en) * 2002-12-23 2004-06-24 Mau Peter K.L. Natural language interface semantic object module
US8548794B2 (en) * 2003-07-02 2013-10-01 University Of Southern California Statistical noun phrase translation
WO2005041925A2 (en) * 2003-10-31 2005-05-12 Alza Corporation Compositions and dosage forms for enhanced absorption
US20050125486A1 (en) * 2003-11-20 2005-06-09 Microsoft Corporation Decentralized operating system
US7418378B2 (en) * 2003-12-22 2008-08-26 Microsoft Corporation Method and apparatus for training and deployment of a statistical model of syntactic attachment likelihood
US8296126B2 (en) * 2004-02-25 2012-10-23 Research In Motion Limited System and method for multi-lingual translation
US8296127B2 (en) 2004-03-23 2012-10-23 University Of Southern California Discovery of parallel text portions in comparable collections of corpora and training using comparable texts
US8666725B2 (en) 2004-04-16 2014-03-04 University Of Southern California Selection and use of nonstatistical translation components in a statistical machine translation framework
US7761858B2 (en) * 2004-04-23 2010-07-20 Microsoft Corporation Semantic programming language
US7689410B2 (en) * 2004-04-23 2010-03-30 Microsoft Corporation Lexical semantic structure
JP5452868B2 (en) 2004-10-12 2014-03-26 ユニヴァーシティー オブ サザン カリフォルニア Training for text-to-text applications that use string-to-tree conversion for training and decoding
US20060083431A1 (en) * 2004-10-20 2006-04-20 Bliss Harry M Electronic device and method for visual text interpretation
US20060139687A1 (en) * 2004-12-28 2006-06-29 Brother Kogyo Kabushiki Kaisha Contents providing system, client device, server and program
US7707131B2 (en) * 2005-03-08 2010-04-27 Microsoft Corporation Thompson strategy based online reinforcement learning system for action selection
US7734471B2 (en) 2005-03-08 2010-06-08 Microsoft Corporation Online learning for dialog systems
US7885817B2 (en) * 2005-03-08 2011-02-08 Microsoft Corporation Easy generation and automatic training of spoken dialog systems using text-to-speech
US8676563B2 (en) 2009-10-01 2014-03-18 Language Weaver, Inc. Providing human-generated and machine-generated trusted translations
US8886517B2 (en) 2005-06-17 2014-11-11 Language Weaver, Inc. Trust scoring for language translation systems
US7689411B2 (en) * 2005-07-01 2010-03-30 Xerox Corporation Concept matching
US7809551B2 (en) * 2005-07-01 2010-10-05 Xerox Corporation Concept matching system
US7640160B2 (en) 2005-08-05 2009-12-29 Voicebox Technologies, Inc. Systems and methods for responding to natural language speech utterance
US7620549B2 (en) 2005-08-10 2009-11-17 Voicebox Technologies, Inc. System and method of supporting adaptive misrecognition in conversational speech
US7949529B2 (en) 2005-08-29 2011-05-24 Voicebox Technologies, Inc. Mobile systems and methods of supporting natural language human-machine interactions
US8805675B2 (en) * 2005-11-07 2014-08-12 Sap Ag Representing a computer system state to a user
US7840451B2 (en) * 2005-11-07 2010-11-23 Sap Ag Identifying the most relevant computer system state information
US10319252B2 (en) 2005-11-09 2019-06-11 Sdl Inc. Language capability assessment and training apparatus and techniques
US7979295B2 (en) * 2005-12-02 2011-07-12 Sap Ag Supporting user interaction with a computer system
US7676489B2 (en) * 2005-12-06 2010-03-09 Sap Ag Providing natural-language interface to repository
US8082496B1 (en) * 2006-01-26 2011-12-20 Adobe Systems Incorporated Producing a set of operations from an output description
US8229733B2 (en) * 2006-02-09 2012-07-24 John Harney Method and apparatus for linguistic independent parsing in a natural language systems
US8943080B2 (en) 2006-04-07 2015-01-27 University Of Southern California Systems and methods for identifying parallel documents and sentence fragments in multilingual document collections
FR2902913A1 (en) * 2006-06-21 2007-12-28 France Telecom Semantic and spatial similarity note calculating and encoding method for tourism field, involves calculating and encoding semantic and spatial note by relatively comparing with respective common semantic characteristics
US8886518B1 (en) 2006-08-07 2014-11-11 Language Weaver, Inc. System and method for capitalizing machine translated text
US20080086298A1 (en) * 2006-10-10 2008-04-10 Anisimovich Konstantin Method and system for translating sentences between langauges
US8214199B2 (en) * 2006-10-10 2012-07-03 Abbyy Software, Ltd. Systems for translating sentences between languages using language-independent semantic structures and ratings of syntactic constructions
US9984071B2 (en) 2006-10-10 2018-05-29 Abbyy Production Llc Language ambiguity detection of text
US9645993B2 (en) 2006-10-10 2017-05-09 Abbyy Infopoisk Llc Method and system for semantic searching
US8145473B2 (en) 2006-10-10 2012-03-27 Abbyy Software Ltd. Deep model statistics method for machine translation
US9235573B2 (en) 2006-10-10 2016-01-12 Abbyy Infopoisk Llc Universal difference measure
US8195447B2 (en) * 2006-10-10 2012-06-05 Abbyy Software Ltd. Translating sentences between languages using language-independent semantic structures and ratings of syntactic constructions
US9633005B2 (en) 2006-10-10 2017-04-25 Abbyy Infopoisk Llc Exhaustive automatic processing of textual information
US8548795B2 (en) * 2006-10-10 2013-10-01 Abbyy Software Ltd. Method for translating documents from one language into another using a database of translations, a terminology dictionary, a translation dictionary, and a machine translation system
US9047275B2 (en) 2006-10-10 2015-06-02 Abbyy Infopoisk Llc Methods and systems for alignment of parallel text corpora
US8073681B2 (en) 2006-10-16 2011-12-06 Voicebox Technologies, Inc. System and method for a cooperative conversational voice user interface
DE102006050112A1 (en) * 2006-10-25 2008-04-30 Dspace Digital Signal Processing And Control Engineering Gmbh Requirement description e.g. test specification, creating method for embedded system i.e. motor vehicle control device, involves automatically representing modules, and assigning to classes in particular unified modeling language classes
US8433556B2 (en) 2006-11-02 2013-04-30 University Of Southern California Semi-supervised training for statistical word alignment
US9122674B1 (en) 2006-12-15 2015-09-01 Language Weaver, Inc. Use of annotations in statistical machine translation
US8468149B1 (en) 2007-01-26 2013-06-18 Language Weaver, Inc. Multi-lingual online community
US7818176B2 (en) 2007-02-06 2010-10-19 Voicebox Technologies, Inc. System and method for selecting and presenting advertisements based on natural language processing of voice-based input
US8615389B1 (en) 2007-03-16 2013-12-24 Language Weaver, Inc. Generation and exploitation of an approximate language model
US8959011B2 (en) 2007-03-22 2015-02-17 Abbyy Infopoisk Llc Indicating and correcting errors in machine translation systems
US8831928B2 (en) 2007-04-04 2014-09-09 Language Weaver, Inc. Customizable machine translation service
US8825466B1 (en) 2007-06-08 2014-09-02 Language Weaver, Inc. Modification of annotated bilingual segment pairs in syntax-based machine translation
US8812296B2 (en) 2007-06-27 2014-08-19 Abbyy Infopoisk Llc Method and system for natural language dictionary generation
US8140335B2 (en) 2007-12-11 2012-03-20 Voicebox Technologies, Inc. System and method for providing a natural language voice user interface in an integrated voice navigation services environment
US8706477B1 (en) 2008-04-25 2014-04-22 Softwin Srl Romania Systems and methods for lexical correspondence linguistic knowledge base creation comprising dependency trees with procedural nodes denoting execute code
US8521512B2 (en) * 2008-04-30 2013-08-27 Deep Sky Concepts, Inc Systems and methods for natural language communication with a computer
US8589161B2 (en) 2008-05-27 2013-11-19 Voicebox Technologies, Inc. System and method for an integrated, multi-modal, multi-device natural language voice services environment
US9305548B2 (en) 2008-05-27 2016-04-05 Voicebox Technologies Corporation System and method for an integrated, multi-modal, multi-device natural language voice services environment
US20090295836A1 (en) * 2008-05-27 2009-12-03 Ravenflow, Inc. System and method for representing large activity diagrams
US20090326925A1 (en) * 2008-06-27 2009-12-31 Microsoft Corporation Projecting syntactic information using a bottom-up pattern matching algorithm
US9262409B2 (en) 2008-08-06 2016-02-16 Abbyy Infopoisk Llc Translation of a selected text fragment of a screen
US8155949B1 (en) 2008-10-01 2012-04-10 The United States Of America As Represented By The Secretary Of The Navy Geodesic search and retrieval system and method of semi-structured databases
US8326637B2 (en) 2009-02-20 2012-12-04 Voicebox Technologies, Inc. System and method for processing multi-modal device interactions in a natural language voice services environment
US9805020B2 (en) 2009-04-23 2017-10-31 Deep Sky Concepts, Inc. In-context access of stored declarative knowledge using natural language expression
US8214366B2 (en) * 2009-11-17 2012-07-03 Glace Holding Llc Systems and methods for generating a language database that can be used for natural language communication with a computer
US8972445B2 (en) 2009-04-23 2015-03-03 Deep Sky Concepts, Inc. Systems and methods for storage of declarative knowledge accessible by natural language in a computer capable of appropriately responding
US8275788B2 (en) * 2009-11-17 2012-09-25 Glace Holding Llc System and methods for accessing web pages using natural language
US8762130B1 (en) 2009-06-17 2014-06-24 Softwin Srl Romania Systems and methods for natural language processing including morphological analysis, lemmatizing, spell checking and grammar checking
US8762131B1 (en) 2009-06-17 2014-06-24 Softwin Srl Romania Systems and methods for managing a complex lexicon comprising multiword expressions and multiword inflection templates
US8990064B2 (en) 2009-07-28 2015-03-24 Language Weaver, Inc. Translating documents based on content
US8380486B2 (en) 2009-10-01 2013-02-19 Language Weaver, Inc. Providing machine-generated translations and corresponding trust levels
US9171541B2 (en) 2009-11-10 2015-10-27 Voicebox Technologies Corporation System and method for hybrid processing in a natural language voice services environment
CN101739395A (en) * 2009-12-31 2010-06-16 程光远 Machine translation method and system
US10417646B2 (en) 2010-03-09 2019-09-17 Sdl Inc. Predicting the cost associated with translating textual content
US8949773B2 (en) * 2010-03-25 2015-02-03 International Business Machines Corporation Deriving process models from natural language use case models
MX352407B (en) * 2010-08-05 2017-11-23 R Galassi Christopher System and method for multi-dimensional knowledge representation.
RU2010151821A (en) * 2010-12-17 2012-06-27 Виталий Евгеньевич Пилкин (RU) METHOD FOR AUTOMATED TRANSFER OF INFORMATION
US8688453B1 (en) * 2011-02-28 2014-04-01 Nuance Communications, Inc. Intent mining via analysis of utterances
US11003838B2 (en) 2011-04-18 2021-05-11 Sdl Inc. Systems and methods for monitoring post translation editing
US8694303B2 (en) 2011-06-15 2014-04-08 Language Weaver, Inc. Systems and methods for tuning parameters in statistical machine translation
US8838434B1 (en) * 2011-07-29 2014-09-16 Nuance Communications, Inc. Bootstrap call router to other languages using selected N-best translations
US8886515B2 (en) 2011-10-19 2014-11-11 Language Weaver, Inc. Systems and methods for enhancing machine translation post edit review processes
US8942973B2 (en) 2012-03-09 2015-01-27 Language Weaver, Inc. Content page URL translation
US8989485B2 (en) 2012-04-27 2015-03-24 Abbyy Development Llc Detecting a junction in a text line of CJK characters
US8971630B2 (en) 2012-04-27 2015-03-03 Abbyy Development Llc Fast CJK character recognition
EP2856344A1 (en) * 2012-05-24 2015-04-08 IQser IP AG Generation of queries to a data processing system
US10261994B2 (en) 2012-05-25 2019-04-16 Sdl Inc. Method and system for automatic management of reputation of translators
US9436382B2 (en) 2012-09-18 2016-09-06 Adobe Systems Incorporated Natural language image editing
US9412366B2 (en) 2012-09-18 2016-08-09 Adobe Systems Incorporated Natural language image spatial and tonal localization
US9588964B2 (en) 2012-09-18 2017-03-07 Adobe Systems Incorporated Natural language vocabulary generation and usage
US10656808B2 (en) 2012-09-18 2020-05-19 Adobe Inc. Natural language and user interface controls
US9619459B2 (en) * 2012-10-01 2017-04-11 Nuance Communications, Inc. Situation aware NLU/NLP
US9152622B2 (en) 2012-11-26 2015-10-06 Language Weaver, Inc. Personalized machine translation via online adaptation
US10282419B2 (en) 2012-12-12 2019-05-07 Nuance Communications, Inc. Multi-domain natural language processing architecture
IN2013CH01237A (en) 2013-03-21 2015-08-14 Infosys Ltd
US10579835B1 (en) * 2013-05-22 2020-03-03 Sri International Semantic pre-processing of natural language input in a virtual personal assistant
US9213694B2 (en) 2013-10-10 2015-12-15 Language Weaver, Inc. Efficient online domain adaptation
US20150169285A1 (en) * 2013-12-18 2015-06-18 Microsoft Corporation Intent-based user experience
RU2592395C2 (en) 2013-12-19 2016-07-20 Общество с ограниченной ответственностью "Аби ИнфоПоиск" Resolution semantic ambiguity by statistical analysis
RU2586577C2 (en) 2014-01-15 2016-06-10 Общество с ограниченной ответственностью "Аби ИнфоПоиск" Filtering arcs parser graph
US10657296B2 (en) * 2014-05-09 2020-05-19 Autodesk, Inc. Techniques for using controlled natural language to capture design intent for computer-aided design
RU2596600C2 (en) 2014-09-02 2016-09-10 Общество с ограниченной ответственностью "Аби Девелопмент" Methods and systems for processing images of mathematical expressions
US9898459B2 (en) 2014-09-16 2018-02-20 Voicebox Technologies Corporation Integration of domain information into state transitions of a finite state transducer for natural language processing
EP3195145A4 (en) 2014-09-16 2018-01-24 VoiceBox Technologies Corporation Voice commerce
CN107003999B (en) 2014-10-15 2020-08-21 声钰科技 System and method for subsequent response to a user's prior natural language input
US20160124937A1 (en) * 2014-11-03 2016-05-05 Service Paradigm Pty Ltd Natural language execution system, method and computer readable medium
JP2016095698A (en) * 2014-11-14 2016-05-26 日本電信電話株式会社 Translation learning device, translation device, method, and program
US9792095B2 (en) 2014-11-25 2017-10-17 Symbol Technologies, Llc Apparatus and method for converting a procedure manual to an automated program
US9626358B2 (en) 2014-11-26 2017-04-18 Abbyy Infopoisk Llc Creating ontologies by analyzing natural language texts
US10614799B2 (en) 2014-11-26 2020-04-07 Voicebox Technologies Corporation System and method of providing intent predictions for an utterance prior to a system detection of an end of the utterance
US10431214B2 (en) 2014-11-26 2019-10-01 Voicebox Technologies Corporation System and method of determining a domain and/or an action related to a natural language input
CN105701120B (en) 2014-11-28 2019-05-03 华为技术有限公司 The method and apparatus for determining semantic matching degree
US10133728B2 (en) * 2015-03-20 2018-11-20 Microsoft Technology Licensing, Llc Semantic parsing for complex knowledge extraction
US9766868B2 (en) 2016-01-29 2017-09-19 International Business Machines Corporation Dynamic source code generation
US9619209B1 (en) 2016-01-29 2017-04-11 International Business Machines Corporation Dynamic source code generation
US10380258B2 (en) * 2016-03-31 2019-08-13 International Business Machines Corporation System, method, and recording medium for corpus pattern paraphrasing
US10331784B2 (en) 2016-07-29 2019-06-25 Voicebox Technologies Corporation System and method of disambiguating natural language processing requests
US10984195B2 (en) 2017-06-23 2021-04-20 General Electric Company Methods and systems for using implied properties to make a controlled-english modelling language more natural
FR3077656A1 (en) * 2018-02-07 2019-08-09 Christophe Leveque METHOD FOR TRANSFORMING A SEQUENCE TO MAKE IT EXECUTABLE BY A MACHINE
US10936810B2 (en) * 2018-12-04 2021-03-02 International Business Machines Corporation Token embedding based on target-context pairs
RU2708213C1 (en) * 2019-01-09 2019-12-04 Олег Владимирович Постников Method of changing information on a search results page and a method of performing operations on text fragments
US11514246B2 (en) * 2019-10-25 2022-11-29 International Business Machines Corporation Providing semantic completeness assessment with minimal domain-specific data
CN111078216B (en) * 2019-11-08 2023-06-02 泰康保险集团股份有限公司 Information display method, information display device, electronic equipment and computer readable medium
WO2022043675A2 (en) * 2020-08-24 2022-03-03 Unlikely Artificial Intelligence Limited A computer implemented method for the automated analysis or use of data
US11681732B2 (en) 2020-12-23 2023-06-20 International Business Machines Corporation Tuning query generation patterns
CN116822517B (en) * 2023-08-29 2023-11-10 百舜信息技术有限公司 Multi-language translation term identification method

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5895466A (en) 1997-08-19 1999-04-20 At&T Corp Automated natural language understanding customer service system
US5966686A (en) 1996-06-28 1999-10-12 Microsoft Corporation Method and system for computing semantic logical forms from syntax trees
US6246977B1 (en) 1997-03-07 2001-06-12 Microsoft Corporation Information retrieval utilizing semantic representation of text and based on constrained expansion of query words

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3009215B2 (en) * 1990-11-30 2000-02-14 株式会社日立製作所 Natural language processing method and natural language processing system
JPH05324713A (en) * 1992-05-20 1993-12-07 Hitachi Ltd Method and system for natural language processing
US5682539A (en) * 1994-09-29 1997-10-28 Conrad; Donovan Anticipated meaning natural language interface
JPH08167006A (en) * 1994-12-13 1996-06-25 Canon Inc Natural language processor and its method
US5878385A (en) * 1996-09-16 1999-03-02 Ergo Linguistic Technologies Method and apparatus for universal parsing of language
US5836771A (en) * 1996-12-02 1998-11-17 Ho; Chi Fai Learning method and system based on questioning
US6108620A (en) * 1997-07-17 2000-08-22 Microsoft Corporation Method and system for natural language parsing using chunking
US6070134A (en) * 1997-07-31 2000-05-30 Microsoft Corporation Identifying salient semantic relation paths between two words
US6311150B1 (en) * 1999-09-03 2001-10-30 International Business Machines Corporation Method and system for hierarchical natural language understanding

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5966686A (en) 1996-06-28 1999-10-12 Microsoft Corporation Method and system for computing semantic logical forms from syntax trees
US6246977B1 (en) 1997-03-07 2001-06-12 Microsoft Corporation Information retrieval utilizing semantic representation of text and based on constrained expansion of query words
US5895466A (en) 1997-08-19 1999-04-20 At&T Corp Automated natural language understanding customer service system

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
"Microsoft Research: Natural Language Processing Hits High Gear", 3 May 2000, MICROSOFT CORPORATION
RICHARDSON ET AL., ACL '98, 36TH ANN. MTG. ASSOC. COMP. LING . 17H INT. CONF. COMPU. LING., pages 1098 - 1102
See also references of EP1397753A2

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2005041033A2 (en) * 2003-10-23 2005-05-06 Honeywell International Inc. Method and apparatus for a hierarchical object model-based constrained language interpreter-parser
WO2005041033A3 (en) * 2003-10-23 2006-02-23 Honeywell Int Inc Method and apparatus for a hierarchical object model-based constrained language interpreter-parser
CN102603963A (en) * 2012-03-27 2012-07-25 陕西科技大学 Preparation method of environmentally-friendly polyacrylic ester pigment printing adhesive
CN102603963B (en) * 2012-03-27 2014-04-23 陕西科技大学 Preparation method of environmentally-friendly polyacrylic ester pigment printing adhesive
CN102807648A (en) * 2012-09-05 2012-12-05 陕西科技大学 Method for preparing high-elasticity adhesive for fabric by adopting nuclear shell emulsion polymerization method
CN102807648B (en) * 2012-09-05 2014-04-23 陕西科技大学 Method for preparing high-elasticity adhesive for fabric by adopting nuclear shell emulsion polymerization method
CN102911308A (en) * 2012-11-19 2013-02-06 陕西科技大学 Method for preparing fluorine contained polyacrylate/dual-sized nano SiO2 composite emulsion
CN102911308B (en) * 2012-11-19 2014-04-16 陕西科技大学 Method for preparing fluorine contained polyacrylate/dual-sized nano SiO2 composite emulsion
US11520975B2 (en) 2016-07-15 2022-12-06 Intuit Inc. Lean parsing: a natural language processing system and method for parsing domain-specific languages
US11222266B2 (en) 2016-07-15 2022-01-11 Intuit Inc. System and method for automatic learning of functions
US11663495B2 (en) 2016-07-15 2023-05-30 Intuit Inc. System and method for automatic learning of functions
US11663677B2 (en) 2016-07-15 2023-05-30 Intuit Inc. System and method for automatically generating calculations for fields in compliance forms
CN106381709B (en) * 2016-09-05 2018-07-06 南通纺织丝绸产业技术研究院 For super-hydrophobic and anti UV finishing agent, the preparation method and applications of textile
CN106381709A (en) * 2016-09-05 2017-02-08 南通纺织丝绸产业技术研究院 Super-hydrophobic and anti-ultraviolet finishing agent used for textiles, and preparation method and application thereof
EP3685284A4 (en) * 2017-09-22 2021-06-16 Intuit Inc. Lean parsing: a natural language processing system and method for parsing domain-specific languages
US11687721B2 (en) 2019-05-23 2023-06-27 Intuit Inc. System and method for recognizing domain specific named entities using domain specific word embeddings
US11783128B2 (en) 2020-02-19 2023-10-10 Intuit Inc. Financial document text conversion to computer readable operations
WO2023154392A1 (en) * 2022-02-14 2023-08-17 Google Llc Conversation graph navigation with language model

Also Published As

Publication number Publication date
WO2002103555A3 (en) 2003-11-06
EP1397753A2 (en) 2004-03-17
US7085708B2 (en) 2006-08-01
US20040181390A1 (en) 2004-09-16
AU2002304439A1 (en) 2003-01-02

Similar Documents

Publication Publication Date Title
US7085708B2 (en) Computer system with natural language to machine language translator
US9070090B2 (en) Scalable string matching as a component for unsupervised learning in semantic meta-model development
US10275424B2 (en) System and method for language extraction and encoding
Cheng et al. Learning an executable neural semantic parser
US7630981B2 (en) Method and system for learning ontological relations from documents
US20140156282A1 (en) Method and system for controlling target applications based upon a natural language command string
EP2915068A2 (en) Natural language processing system and method
Javad Hosseini et al. Learning typed entailment graphs with global soft constraints
US9043265B2 (en) Methods and systems for constructing intelligent glossaries from distinction-based reasoning
Dahl Translating spanish into logic through logic
Chihani et al. A semantic framework for proof evidence
Eger et al. A comparison of four character-level string-to-string translation models for (OCR) spelling error correction
Patrick et al. Automated proof reading of clinical notes
Hosseini Unsupervised learning of relational entailment graphs from text
Wang Handling grammatical errors, ambiguity and impreciseness in GIS natural language queries
US20220237383A1 (en) Concept system for a natural language understanding (nlu) framework
US20220229990A1 (en) System and method for lookup source segmentation scoring in a natural language understanding (nlu) framework
Liu et al. SENET: a semantic web for supporting automation of software engineering tasks
Kim et al. A (mostly) symbolic system for monotonic inference with unscoped episodic logical forms
Agarwal et al. UNLization of Punjabi text for natural language processing applications
Saparov A probabilistic generative grammar for semantic parsing
Cohen et al. Dynamic programming algorithms as products of weighted logic programs
Basile et al. The JUMP project: Domain Ontologies and Linguistic Knowledge@ Work.
Sowa Relating templates to language and logic
Suhail et al. A Bottom-Up Approach applied to Dependency Parsing in Malayalam Language

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ OM PH PL PT RO RU SD SE SG SI SK SL TJ TM TN TR TT TZ UA UG US UZ VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
WWE Wipo information: entry into national phase

Ref document number: 2002732949

Country of ref document: EP

WWP Wipo information: published in national office

Ref document number: 2002732949

Country of ref document: EP

REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

NENP Non-entry into the national phase

Ref country code: JP

WWW Wipo information: withdrawn in national office

Country of ref document: JP