CA2351406A1 - Universal translation method - Google Patents

Universal translation method Download PDF

Info

Publication number
CA2351406A1
CA2351406A1 CA002351406A CA2351406A CA2351406A1 CA 2351406 A1 CA2351406 A1 CA 2351406A1 CA 002351406 A CA002351406 A CA 002351406A CA 2351406 A CA2351406 A CA 2351406A CA 2351406 A1 CA2351406 A1 CA 2351406A1
Authority
CA
Canada
Prior art keywords
source
language
target
word
computer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
CA002351406A
Other languages
French (fr)
Inventor
William E. Datig
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Publication of CA2351406A1 publication Critical patent/CA2351406A1/en
Abandoned legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/55Rule-based translation

Abstract

A source signal embodying knowledge is decomposed into a simple and regular internal representation of a decomposition of epistemic instances of the knowledge. An epistemic instance is a fundamental semantic structure that expresses a transformation of two objects. The internal representation is then transformed into another internal representation from which a target signal is constructed. The complexity of language is localized within rules for decomposing the source signal into the internal representation and rules for transforming the internal representation into another internal representation.
In one embodiment, source signal decomposition and target signal constructions are facilitated by look ups into a universal dictionary.

Description

UNIVERSAL TRANSLATION METHOD
RELATED APPLICATIONS
The present application contains subject matter related to the following co-pending applications:
U.S. Application Serial Na. 08/847,230 entitled "Universal Epistemological Machine (a.k.a. Android)," filed on May 1, 1997 by William E. Datig (docket no.
MAM 188), the contents of which are hereby incorporated by reference herein;
U.S. Application Serial No. 08/876,378 entitled "Universal Machine Translator of Arbitrary Languages", filed on June 16, 1997 by William E. Datig (docket no.
MAM288), the contents of which are hereby incorporated by reference herein; and U.S. Application Serial No. 09/033,676 entitled "Universal Machine Translator of Arbitrary Languages", filed on March 3, 1998 by William E. Datig (docket no.
MAM388), the contents of which are hereby incorporated by reference herein.
FIELD OF THE INVENTION
The present invention relates to machine translation and, more particularly, to a universal translation system applicable for transforming signals that embody knowledge according to a knowledge representation, such as a natural language.
BACKGROUND OF THE INVENTION
There is a long felt need for a reliable, high-quality language translation system.
The increasing internationalization and globalization of the world's economies continues to bring many different people together who speak different languages for business. A
significant cost and obstacle, however, continues to be the requirement to translate documents and spoken words from one language to another. In particular, it is difficult to find competent and affordable translators who are both fluent in the desired languages and can understand the subject matter as well. Researchers have been investigating for some time whether and how translation of natural and artifcial languages can be automated.

Perhaps the single most difficult impediment to a high-quality automated language translation system is the sheer complexity of the world's human languages.
Human languages are notoriously complex, especially in their vocabularies and grammars.
Conventional attempts to perform machine translation, however, have not been able to manage this complexity very well.
According to one approach, such as that described in U.S. Patent No.
4.706,212, software routines are hard-coded to translate sentences in a source language to sentences in a target language. In particular, the complexity of the grammar of the source and target languages is handled by various ad-hoc, hard-coded logic. For example, U.S.
Patent No.
4,706,212 discloses logic for recognizing some grammatical constructions in English as a source language and outputting a Russian construction. The logic devised for recognizing and translating these source grammatical constructions. however, is tightly coupled to a particular source language. As a practical matter, most of the subroutines coded to handle English source construction are utterly inapplicable for another language such as Chinese.
Therefore, extending such conventional translation systems to handle a new source or target language requires a virtual re-implementation of the entire system.
Furthermore, since the hard-coded logic is often quite complicated, it is difficult and expensive to debug and maintain, especially to improve the quality of the language translation.
Since handling grammatical rules by special purpose subroutines is difficult to debug, maintain, and extend, other conventional attempts have attempted to circumvent the above difficulties by utilizing complicated internal data structures to represent the text under translation. For example, U.S. Patent No. 5,28,491 describes a system in which a graph of possible interpretations is produced according to grammar rules of a specific source language, such as English. In general, these data structures are quite complex with a variety of node types for different grammatical constructions, especially if such a system attempts to implement the principles of Noam Chomsky's transformational grammar. Since each language employs different grammatical constructions, the data structure for one language is often not usable for another language.
Another example of a complicated internal data structure is an interlingua.
which is an artificial language devised for representing a superset of the source and target languages.
Such an approach is described, for example. in U.S. Patent No. 5.426.583. In order to be useful, the interlingua must be designed to include all the features of the source and target languages. Thus, if capability for a new language is to be added to an interlingual system, then the interlingua typically needs to be upgraded, requiring modification to the routines that translate to and from the interlingua. Other conventional approaches, such as U.S.
Patent No. 5.477,451, employ complex statistical or mathematical models to translate human text.
In general, conventional approaches at best manage the complexity of language in an ad-hoc instead of a systematic manner. As a result, it is difficult to extend such conventional systems to support a new language. Furthermore, such techniques are even more difficult to apply in mixed language situations, including, for example, computer programming languages embedded in a natural language context. Another drawback is that such systems are difficult to debug and therefore difficult to tweak to achieve high-quality translations.
SUMMARY OF THE INVENTION
There has long been a need for reliable, high quality, automated language translation. The necessity for a language translation system that is readily extensible to new human languages is apparent. There is also a need for a language translation system and methodology that are capable of handling mixed-language texts, especially those texts that also include an artificial language, such as a programming language.
These and other needs are addressed by the present invention. in which a source signal embodying knowledge is decomposed into a simple and regular internal representation. This internal representation is then transformed into another internal representation from which a target signal is constructed. Despite the simplicity of the internal representation, the complexity of language is appropriately localized within rules for decomposing the source signal into the internal representation and transforming the internal representation into another internal representation. fn one embodiment, source signal decomposition and target signal constructions are facilitated by look ups into a universal dictionary. Advantageously, extending and improving such a language translation system involves updating the universal dictionary. the decomposition rules.
and the mapping rules, thereby avoiding modification of hard-coded logic and internal data structures.
Accordingly, one aspect of the invention relates to a method and a computer-readable medium bearing instructions for translating a source signal embodying information according to a source language into a target signal embodying information according to target language. The methodology involves analyzing the source signal to produce a first internal representation of epistemic instances corresponding to the information embodied in the source signal. The epistemic instances are fundamental semantic structures expressing a transformation of two objects or objective grammatical forms. The first internal representation is transformed into a second internal representation of epistemic instances I S according to the target language, and the target signal is constructed based on the second internal representation.
In various embodiments, the source language and the target language may be any of a natural language, a computer language, formatting conventions, and mathematical expressions. The source signal and the target signal can be realized as digital signals representing tent, acoustic signals representing speech, optical signals representing characters, and as any other analog or digital signal. The described methodology is also applicable to transforming a source signal embodying information according a knowledge representation relating to a knowledge discipline, for example, physics and engineering.
According to another aspect of the invention. a method and computer-readable medium bearing instructions for translating a source signal embodying information according to a source language into a target signal embodying information according to target language involve storing related dictionary entries in a computer-readable medium.
Each related dictionary entry includes a source word form. a source grammatical form, a corresponding target word form, anal a target grammatical form for the target word form.

The grammatical form in some embodiments relates to a sub-grammatical form for specifying the morphology of a word form including grammatical inflection and auxiliaries.
The source signal is analyzed to produce a first internal representation of the information embodied in the source signal based on the dictionary entries. The first internal S representation is transformed into a second internal representation according to the second internal representation. from which the target signal is constructed.
Advantageously, the dictionary entries properly localize much of the complexity language. Word forms in the dictionary entries may correspond to one or multiple lexical words interspersed throughout a sentence.
In accordance with one feature, the source signal is analyzed by applying decomposition rules that describe how to partition the word forms in the source signal into three data sets constituting a triplet. One advantage of a triplet data structure is that it can provide a straightforward representation of an epistemic instance. According to another feature. at least some of the triplets in the first internal representation are mapped to produce I S the second internal representation by accessing a sequence of mapping rules. The mapping rules describe a correspondence between a source language triplet and a corresponding target language triplet.
Additional objects, advantages, and novel features of the present invention will be set forth in part in the description that follows. and in part, will become apparent upon examination or may be learned by practice of the invention. The objects and advantages of the invention may be realized and obtained by means of the instrumentalities and combinations particularly pointed out in the appended claims.
BRIEF DESCRIPTION OF THE DRAWINGS
The present invention is illustrated by way of example. and not by way of limitation, in the figures of the accompanying drawings and in which like reference numerals refer to similar elements and in which:
FIG 1 schematically depicts a computer system upon which an embodiment of the present invention can be implemented;

FIG. ? schematically depicts components of a universal language translation system:
FIG. 3(a) is a flowchart showing the operation of a universal language translation system;
FIG. 3(b) is a flowchart showing the operation of determining a grammatical form for a word;
FIG. 4(a) depicts an initial split tree before mapping;
FIG. ~(b) depicts the split tree during mapping;
FIG. 4(c) depicts the split tree at another point during mapping;
FIG. 4(d) depicts the split tree at still another point during mapping;
FIG. 4(e) depicts the split tree at yet another point during mapping; and FIG. 4(f) depicts the split tree after mapping.
DESCRIPTION OF THE PREFERRED EMBODIMENT
A method and apparatus for language translation are described. In the following description, for the purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the present invention. It will be apparent, however, to one skilled in the art that the present invention may be practiced without these specific details. In other instances, well-known structures and devices are shown in block diagram form in order to avoid unnecessarily obscuring the present invention.
THEORETICAL OVERVIEW
The present invention stems from the realization that there is a common, regular model for all languages and indeed any knowledge representation. This common model is referred to as a '' universal grammar." A universal grammar differs from an interlingua, because a universal grammar presents a simple, regular semantic construction of information. while an interlingua is an eclectic conglomeration of grammatical and syntactical surface features of various human languages. A universal grammar is described in more detail in U.S. Application Serial No. 09/033,676.

In accordance with this model. there is a simple and universal means of expressing the information communicated by any language and, in fact. expressing all knowledge.
This universal means of expression is referred to herein as a decomposition into "epistemic instances." An epistemic instance, which underlies all moments of a being's consciousness and perceptions, is a semantic structure of the meaning of any moment of any language.
In essence. an epistemic instance expresses a transformation of two objects.
In a linguistic context, the transformation is referred to herein as the "metaverb"
and usually corresponds to lexical objects in a human language involving actions or relationships, such as verbs, prepositions, and conjunctions. The objects are referred to herein as "metanouns"
and usually correspond to lexical objects involving things or concepts as nouns, pronouns, determiners, and adjectives. Lexical objects include words. phrases, clauses, sentences, grammatical segments, textual ideas. acoustic phones and phonemes. character strokes of alphabetic letters and Chinese characters, and any other component of meaning.
Many metaverbs and metanouns can be further decomposed into one or more epistemic instances, expressing a transformation of objects.
For example, in the English clause '' I have read the book under the table,"
the verb "have read" is the metaverb and the subject pronoun "I" is the first metanoun and the object noun phrase ''the book under the table" is the second metanoun. because "I" and '' the book under the table" undergo a transformation with each other by reading. The second metanoun, "the book under the table'' can be further decomposed into '' the book"
and "the table" transforming through "under." Here, the metaverb '' under"
pertains to how "the book" and "the table" are interrelated.
Since all information represented by a language (and indeed all knowledge as explained hereinafter) can be decomposed into epistemic instances and since each epistemic instance comprises a metaverb and two metanouns. it follows that any linguistic sentence can be internally represented with a simple data structure comprising a set of three elements that may be decomposed into other three element sets. In contrast with some conventional attempts, this internal representation is simple and regular, involving decompositions of three element epistemic instances, and is therefore applicable without change to any human language and readily manipulated in a machine such as a computer. As a result.
the translation system is readily extensible to new languages.
Accordingly, the universal grammar provides the necessary direction for properly localizing the complexity of language. Since linguistic utterances have a simple, regular underlying structure, it follows that the complexity inherent in human language must located elsewhere. More specifically, a universal dictionary is provided for handling the different shades of meaning and grammatical functions each word has, a sequence of split (or decomposition) rules is provided for breaking down a word stream in a decomposition of epistemic instances. and a sequence of mapping rules is provided for mapping the epistemic instances relating to moments of the source language into the epistemic instances relating to moments of the target language. Localizing the complexity within a universal dictionary, split rules, mapping rules, and reconstruction rules allows linguists who are not computer programmers to easily upgrade, extend, and even tweak the language translation system to produce an extensible, reliable, high-quality language translation system.

FIG. 1 is a block diagram that illustrates a computer system 100 upon which an embodiment of the invention may be implemented. Computer system 100 includes a bus 102 or other communication mechanism for communicating information, and a processor 104 coupled with bus 102 for processing information. Computer system 100 also includes a main memory 106, such as a random access memory (RAM) or other dynamic storage device, coupled to bus 102 for storing information and instructions to be executed by processor 104. Main memory 106 also may be used for storing temporary variables or other intermediate information during execution of instructions to be executed by processor 104.
Computer system I00 further includes a read only memory (ROM) I08 or other static storage device coupled to bus I02 for storing static information and instructions for processor 104. A storage device 110, such as a magnetic disk or optical disk, is provided and coupled to bus 102 for storing information and instructions.

Computer system 100 may be coupled via bus 102 to a display 112. such as a cathode ray tube (CRT), for displaying information to a computer user. An input device 114. including alphanumeric and other keys, is coupled to bus 102 for communicating information and command selections to processor 104. Another type of user input device is S cursor control 116, such as a mouse, a trackball, or cursor direction keys for communicating direction information and command selections to processor 104 and for controlling cursor movement on display 112. This input device typically has two degrees of freedom in two axes, a first axis (e.g., x) and a second axis (e.g., y), that allows the device to specify positions in a plane.
The invention is related to the use of computer system 100 for language translation.
According to one embodiment of the invention. language translation is provided by computer system 100 in response to processor 104 executing one or more sequences of one or more instructions contained in main memory 106. Such instructions may be read into main memory 106 from another computer-readable medium. such as storage device 110.
I 5 Execution of the sequences of instructions contained in main memory 106 causes processor 104 to perform the process steps described herein. One or more processors in a multi-processing arrangement may also be employed to execute the sequences of instructions contained in main memory 106. In alternative embodiments, hard-wired circuitry may be used in place of or in combination with software instructions to implement the invention.
Thus. embodiments of the invention are not limited to any specific combination of hardware circuitry and software.
The term '' computer-readable medium" as used herein refers to any medium that participates in providing instructions to processor ~ 04 for execution. Such a medium may take many forms, including but not limited to, non-volatile media. volatile media, and transmission media. Non-volatile media include. for example. optical or magnetic disks, such as storage device 110. Volatile media include dynamic memory, such as main memory 106. Transmission media include coaxial cables, copper wire and fiber optics, including the wires that comprise bus 102. Transmission media can also take the form of acoustic or light waves. such as those venerated during radio frequency (RF) and infrared (IR) data communications. Common forms of computer-readable media include, for example, a floppy disk, a flexible disk. hard disk. magnetic tape. any other magnetic medium, a CD-ROM. DVD. any other optical medium. punch cards. paper tape. any other physical medium with patterns of holes, a RAM, a PROM, and EPROM. a FLASH-EPROM, any other memory chip or cartridge, a carrier wave as described hereinafter, or any other medium from which a computer can read.
Various forms of computer readable media may be involved in carrying one or more sequences of one or more instructions to processor 104 for execution. For example, the instructions may initially be borne on a magnetic disk of a remote computer.
The remote computer can load the instructions into its dynamic memory and send the instructions over a telephone line using a modem. A modem local to computer system 100 can receive the data on the telephone line and use an infrared transmitter to convert the data to an infrared signal.
An infrared detector coupled to bus 102 can receive the data carried in the infrared signal and place the data on bus 102. Bus 102 carries the data to main memory 106.
from which processor 104 retrieves and executes the instructions. The instructions received by main memory 106 may optionally be stored on storage device 1 I O either before or after execution by processor 104.
Computer system 100 also includes a communication interface 118 coupled to bus 102. Communication interface 118 provides a two-way data communication coupling to a network link 120 that is connected to a local network 122. For example, communication interface 118 may be an integrated services digital network (ISDN) card or a modem to provide a data communication connection to a corresponding type of telephone line. As another example, communication interface I 18 may be a local area network (LAN) card to provide a data communication connection to a compatible LAN. Wireless links may also be implemented. In any such implementation. communication interface I I8 sends and receives electrical. electromagnetic or optical signals that carry digital data streams representing various types of information.
Network link 120 typically provides data communication through one or more networks to other data devices. For example. network link 120 may provide a connection through local network 12'_' to a host computer 124 or to data equipment operated by an Internet Service Provider (ISP) 126. ISP 126 in turn provides data communication services through the worldwide packet data communication network, now commonly referred to as the '' Internet ' 128. Local network 122 and Internet 128 both use electrical.
electromagnetic or optical signals that carry digital data streams. The signals through the various networks and the signals on network link 120 and through communication interface 118, which carry the digital data to and from computer system 100, are exemplary forms of carrier waves transporting the information.
Computer system 100 can send messages and receive data. including program code, through the network(s), network link 120, and communication interface I 18. In the Internet example, a server 130 might transmit a requested code for an application program through Internet 128, ISP 126, local network 122 and communication interface 118. In accordance with the invention, one such downloaded application provides for language translation as described herein. The received code may be executed by processor 104 as it is received, and/or stored in storage device 110, or other non-volatile storage for later execution. In this manner, computer system 100 may obtain application code in the form of a carrier wave.
It is to be understood that embodiments of the present invention are not limited to electronic computing machines, but may also be implemented on optical, biological. and quantum-state computing devices or any combination thereof as these devices become practical in the future. In fact, the present invention may be implemented by a suitably constructed android such as what is disclosed in the co-pending patent application, U.S.
Serial No. 08/847,230.
TRANSLATION SYSTE~1 OVERVIEW
FIG. 2 is a schematic diagram of one embodiment of a translation system. The translation system comprises a translation engine 200 which is preferably implemented on a computer system as described hereinabove. The translation engine 200 includes three modules: a word stream analyzer 202 for decomposing a word stream in the source signal into an internal representation. a mapping unit 204 for transforming the internal representation. and a word stream construction module 206 for converting the transformed internal representation into a target signal.
Since the translation engine 200 preferably operates on digital signals. a preprocessor 210 and a postprocessor 220 are provided to accommodate the conversion of acoustic, optical. tactile. and other sensory media into "word forms' in the translation engine's preferred digital medium. The preprocessor 210 and postprocessor 220 may be implemented by several techniques, such as standard speech recognition and standard optical character recognition. by another instantiation of a suitably configured translation engine as described in detail hereinafter, or by any combination of such techniques.
The translation engine 220 is preferably configured by linguists with a data-driven programming process implemented in a linguist interface module 230. The linguist interface module 230 provides a graphical user interface as a front end to enable a linguist or other language expert to enter, correct, and update entries in the sequence of split rules 232, mapping rules 234, and the universal dictionary 236.
The universal dictionary 236 contains related entries for words in both the source and target languages. TABLE 1 illustrates some entries in one implementation of the universal dictionary 236. Corresponding words in the universal dictionary are identified by a common key value. For example, the English word "book". the Chinese word "shu'', and the Italian word "libro" all have the same key value. Furthermore. each of these words have a grammatical form code (GF Code) of RO for noun. The same word form may be present in more than one entry, in which case the word is semantically or grammatically ambiguous. For example, the English verb "read" is ambiguous, because it can correspond to the intransitive Chinese verb '' dtishfi ' (GF Code H2) as in " I read" or to the transitive Chinese verb "yt~eldn" (GF Code H1) as in "I read the book." These ambiguities are handled according to techniques that are described in more detail hereinafter.

Key Language Word Form GF Code ~~

123456 English book RO

123456 Chinese shu RO

123456 Italian libro RO

345672 English read H2 Key ~ Language ~ord Form ~ GF Code 345672 Chinese dushu H2 345672 Italian leggere H2 345673 English read H I

345673 Chinese vuelan H 1 345673 Italian leggere H 1 238474 English I S 1 238474 Chinese wo S 1 238474 Italian io S I

The grammatical form code (GF Code) identifies what functional part of speech the word form belongs to. For example. a GF Code of Hl specifies a transitive verb and a GF
code of H2 specifies an intransitive verb. The grammatical forms will be later used within the word stream analyzer 202 for decomposing the incoming word stream. TABLE 2 lists exemplary grammatical form codes, in which related grammatical form codes share a common prefix, thereby allowing for wildcard matching of grammatical forms.

GF Code Description QO Adjective QI Verbal Adjective - Past Participle Q2 Verbal Adjective - Present Participle IO Adverb I1 Conjunctive Adverb I2 Disjunctive Adverb I3 Adjunct Adverb PO Article P1 Definite Article P2 Indefinite Article GO Conjunction GI Coordinating Conjunction G2 Correlative Conjunction G3 Subordinating Conjunction G4 Coordinating Pronoun 00 Determiner OI Possessive Determiner 02 Predeterminers 03 Postdeterminers 04 Quantitative Determiner NO Final Particle N1 Demonstrative Particle N2 Limitative Particle N3 ~ Emphatic Particle N4 I Definite Particle NS I Affirmative Particle GF Code Description N6 Negative Particle !

N7 Interrogative Particle N8 Exclamatory Particle N9 Indecisive Particle j N10 Comparative Particle N 11 Auxiliary Particle N12 Korean Nominative Particle N 13 Korean Dative Panicle N 14 Korean Accusative Particle N 15 Korean Locative Particle N 16 Korean Instrumental Particle N 17 Korean Auxiliary Particle N 1$ Korean Connective Genitive Particle RO Noun R1 Proper Noun R2 Common Count Noun R3 Common Non-count Noun R4 Verbal Noun - Past Participle RS Verbal Noun - Present Participle MO Preposition SO Pronoun S 1 Personal Pronoun S2 Reflexive Pronoun S3 Possessive Pronoun S4 Interrogative Pronoun SS Relative Pronoun S6 Demonstrative Pronoun S7 Reciprocal Pronoun S8 Universal Pronoun S9 Partitive Pronoun CO Sentence Link HO Verb H 1 Transitive Verb H2 Intransitive Verb H3 Ditransitive Verb H4 Complex Transitive Verb HS Stative Transitive Verb H6 Stative Intransitive Verb H7 Dynamic Transitive Verb H8 Dynamic Intransitive Verb H9 Copular Verb H 10 Co-verb Verb H11 Reflexive Verb H 12 Impersonal Verb H13 Modal Verb H14 Auxiliary Verb GF Code i Description H15 ~i Korean Adjectival Verb H 16 ~ Korean Compound Verb Preferably, special word formations are also identified. A special word formation is a group of individual words that semantically or grammatically act together as one. One way to identify a special word formation is to populate the universal dictionary 236 with an entry for one of the words of the word formation in order to trigger an ambiguity, thereby causing an ambiguity resolution routine to be executed. Executing the ambiguity resolution routines examines other word forms in the word stream to determine whether there is a special word formation. Some examples of special word formations include abbreviations, acronyms, words with apostrophes, proper nouns, non-translatable words (e.g.
X,), special hyphenated noun and adjective compounds. special unhyphenated noun and adjective compounds. special verbs with auxiliaries, special adjectival expressions, special adverbial expressions, special prepositional expressions, idioms, Chinese multiple character words (cy, Arabic character words, French (and other) merged-vowel words. German and Dutch compound nouns and adjectives, cardinal and ordinal numbers, mathematical constructions, and other special-case words of human language such as Chinese measure words.
The grammatical form system described hereinabove may be augmented by a sub-grammatical form table (not shown). A sub-grammatical form is a specialization of a grammatical form that further specifies the morphology or inflection of the word form appropriate to the grammatical form. For example. a sub-grammatical form for a Latin adjective would be a combination of its case. number, and gender, for example nominative feminine singular. Sub-grammatical forms for a particular word form can be stored in a matrix form. wherein the columns correspond to verb tenses/voices/moods and noun cases, and the rows correspond to verb persons/numbers and noun genders/numbers/articuiations.
Sub-grammatical forms can also be identified from a group of word forms. such a verb and its auxiliaries.
The mapping rules 234 specify how to transform the internal representation suitable for a source language into an internal representation suitable for a target language. TABLE
3 depicts exemplary mapping rules that are used to transform an English prepositional phrase into a Chinese prepositional phrase among other transformations.
Generally. an adjectival prepositional phrase in English of the form " X prep. Y"
corresponds the Chinese prepositional phrase "Y prep. de X." The first mapping rule states that the left and right metanouns. labeled -1 and +1 respectively, are swapped. and the second mapping rule states that the Chinese word "cle" is appended to the metaverb. Another mapping rule from English to Chinese is deletion of definite articles (GF Code P1 ), since Chinese lacks definite articles. Finally. since the transitive verb (GF Code H2) comes in the same place in Chinese as in English, the mapping rule specifies a null transformation.

Source Target GF Code ActionWord Set Set A B

EnglishChineseMO Swap * -1 +1 EnglishChineseMO Add de 0 0 EnglishChinesePI Delete* -1 -1 EnglishChineseH 1 ~ Move I * 0 0 ~ J

Referring to TABLE 3, the Source and Target fields indicate the source language and the target language respectively. The GF Code indicates the grammatical form of the record set, which can be a metaverb or a metanoun. The Action field specifies the action to take: "Combine" for combining two word forms into one, "Delete" for deleting a word from a record set, "Move" for moving a word form from one record set to another, "Separate" for separating a word form into two or more word forms. "Add" for adding a word form to a record set. and " Swap" for swapping the positions of two record sets. The Set A and the Set B arguments are used in conjunction with the Action field as appropriate and specify the particular result set used by the Action field: -1 for left the metanoun, 0 for the metaverb, and +1 for the right metanoun. The mapping rules 234 thus provide a powerful mechanism for handling and translating grammatical constructs.
The split rules 232 specify how to decompose a grammatical tagged word stream into a "split tree." A split tree is hierarchical arrangement of triplets.
wherein each triplet partitions one or more word forms into three data sets. The three data sets correspond to the left metanoun, the metaverb, and the right metanoun. respectively, of an epistemic instance.
Since the arrangement is hierarchical. the word forms of the metanoun may be partitioned yet again into two metanouns and a metaverb. Thus. a lower-level triplet partitions the word forms embedded in a data set of a higher-level triplet. and the top-level triplet partitions the words of the source signal. One implementation of a split tree.
using relational database constructs is described in more detail hereinafter.

LanguageSequence Package Name Split Type English 1100 SENT_SVO 4 English 1235 PREP POM NP 1 TABLE 4 illustrates some exemplary split rules 23?. The Language field indicates for which language the split rule is appropriate. The Sequence field provides an ordering of the split rules, generally from the character and word composition level, to the formatting level, to the paragraph level, to the sentence level, to the clause level. and down to the phrase and word level. The Package Name identifies an arrangement of instructions that are configured to identify whether a particular split is appropriate and flags which word forms as belonging to the metaverb (record set 0) of the split. In the example, SENT
SVO is a package for identifying and splitting subject-verb-object sentences and PREP
POM NP for identifying and splitting prepositional phrases that are postmodifiers to noun phrases.
The Split Type field specifies how to perform the split, generally by identifying which word forms that are to be put in the metaverb record set. For example, split type 1 indicates to put only the selected word form in the metaverb record set, split type 3 is for null-metaverb, and split type ~ is for metaverb having more than one word forms, such as a verb and the adverbs that modify it. The remaining word forms. generally to the right and left of the metaverb, are put into the -1 and +1 record sets, respectively, although a special purpose split type may be devised to handle a particularly exotic source language construct.
When a record set is split, record set 0, corresponding to the metaverb, is tagged with a node label indicating the type of split, e.g. SENT SVO.
For decomposing English language text, the following split rules have been found to be useful:
~ simple sentence types including subject-verb. subject-verb-object. subject-verb-indirect object-direct object, subject-verb-direct object-indirect object, subject-verb-predicate. and subject-verb-object-predicate:

~ rearrangements of simple sentence types including cleft. extraposed " it ...
that"
subjects. existential " there" or '' have" , left dislocation, right dislocation.
subject-verb inversion. and exchanged object and predicate;
~ complex sentence types including coordination; subordination linked as main-subordinate for subject clauses. complement clauses. phrase constituents, variably connected clauses. subordination linked to two or more main clauses.
and appositive clauses: varied coordination and subordination; and subordinate clause structures such as nominal clause. relative clauses, adverbial clauses, and comparative clauses; and clause clusters or punctuation;
~ interrogative sentence types including yes/no questions, positive orientation, negative orientation, tag questions. declarative questions, yes/no questions with modal auxiliaries, wh-questions, alternative questions, exclamatory questions, and rhetorical questions;
~ other sentence types including ellipted, general punctuated. formula sentences, aphoristic sentences, block language sentences, unfinished sentences, and non-clauses;
~ noun phrase [DET+PRM+1-IN+POM] types including subject, direct object, indirect object, subject predicative, object predicative. complement of a preposition, premodifier of a noun or noun phrase. vocative, adverbial.
premodifier of an adjective, premodifier of a preposition, premodifier of an adverb. postmodifier of a noun. and postmodifier of an adjective;
~ verb phrase (AUX+V] types including primary auxiliary, and modal auxiliary;
~ adjective phrase [PRM+ADJ+POM] types including premodifier of a noun, subject predicative, object predicative, postmodifier of a pronoun, postmodifier of a noun, nominal adjective. complement of a preposition;
~ adverb phrase (PRM+ADV+POM] types including premodifier of an adjective, and premodifier of an adverb. adverbial. subject predicative. premodifier of a preposition, premodifier of a pronoun, premodifier of a determiner, premodifier of a numeral. premodifier of a noun phrase, postmodifier of a noun phrase.

postmodifier of an adjective or adverb. subject predicative, object predicative.
and complement of a preposition; and ~ prepositional phrase [PFN+PREP+PRCMPJ types including noun phrase as complement. -ink participle clause as complement. wh-clause as complement.
adverb as complement. adjective as complement. postmodifier of a noun, postmodifier of an adjective, subject predicative. object predicative.
adverbial.
and complement of a verb).
One way to facilitate the entry of special word forms and sub-grammatical forms is to establish a word construction table (not shown), which allows for groups of word forms to be specified. The word construction table can also include positional information to help linguists in one embodiment to write the appropriate split rules ?3?. mapping rules 234, and entries in the universal dictionary 236 for these special groups of words.
This positional information may include: any allowable word position. same as the English position, immediately after the keyword, immediately before the keyword, any where after the key 1 S word, separated by a word or grammatical form, in combination, at the end of the sentence, and at the beginning of the sentence. In another embodiment, some of the split rules 232, mapping rules 234, and entries in the universal dictionary 236 can be automatically generated from the word construction. Thus, special word formations, which map multiple word constructions in one language to a single unit in another language. such as idioms. can be handled.
TABLE S illustrates a portion of an exemplary word construction, set up for the verb phrase ''have been running," where position code {A} means any allowable position, position code { B } means before the keyword and separated by an adverb.
TABLE S-SAMPLE OF A WORD CONSTRUCTION TABLE
ICeyword PosO Construction Auxl Posl Aux2 Pos2 .. AuxN Posh running { A } have been running have { B } been { B } . .. - -Referring to FIG. 3(a). depicted is a flowchart illustrating steps performed in the operation of an embodiment of a universal translation system. Step 300 is a preprocessing step that receives a source signal as input in one form. such as a digital signal representing text in a source language. an acoustic signal representing speech in a source language, and an optical signal representing characters in the source language. This preprocessing step 300 may be performed by standard speech recognition or optical character recognition implementations.
In one embodiment, however. the preprocessing functionality is handled by another instantiation of the translation engine, wherein the ancillary data structures are appropriately populated. For example, to perform speech recognition as preprocessing steps, the acoustic wave forms or "phones" are stored in the universal dictionary, split rules are devised to recognize phrase. word, and segment boundaries among the phones, and mapping rules are formulated to translate the phones into textual words. for example into ASCII
and Unicode.
In fact, since the phonemicization of aural speech sounds into recognizable phonemes and then into words is fundamentally a grammatical process, another embodiment of the universal translation system, operating according to the principles of a universal grammar, is an ideal choice for speech recognition. Thus, even the preprocessing and postprocessing stages can be implemented by embodiments of the universal translation system.
The output of preprocessing in step 300 is a sequence of word forms of the input speech text that are embodied in the source signal. This sequence of word forms constitutes a " record set" . For example. a person may utter the sentence '' I have read the book under the table" into a microphone. This acoustic signal, the persons utterance, is converted by a microphone into analog electric signals. Since the preprocessor module 210 is typically implemented on a digital computer system, the analog electric signals are converted by an analog-to-digital converter to produce a digital signal embodiment of the person's utterances. After preprocessing in step 300, a result set 400, illustrated in FIG. 4(a), is produced. In one implementation. specifically with a relational data base, the record set 400 comprises eight rows of a table. one row for each word embodied in the source signal.
WORD STREAM ANALYSIS
Referring back to FIG. 3(a). each word in the record set :100 is successively visited (step 310) by the word stream analyzer 202. More specifically, a grammatical form code is determined for each word in the record set 400 in step 320. FIG. 3(b) illustrates one technique for determining the grammatical word code. First, the word form under consideration is looked up in the universal dictionary 236 (step 321 ). The number of entries returned from the universal dictionary ?36 lookup determines how the execution will branch (step 323). If there are no entries in the universal dictionary 236 for the word form. then the grammatical form is established for the word form as a proper noun (step 325).
In the working example of"I have read the book under the table," no ward form is a proper noun.
If, on the other hand, there is exactly one entry in the universal dictionary 236, then execution branches to step 329 where the grammatical form within the entry of the universal dictionary 236 for the word form is established. In the working example, the pronoun "I"
has exactly one entry (see TABLE 1 ) and therefore the grammatical form of S 1 for personal pronoun is established for the source word form "I." Please note that this example is for purposes of illustration and will probably not reflect the state of an actual universal dictionary 236. For example, useful embodiments of universal dictionary 236 will include a noun entry for every word to handle self referential words, such as the word "I" in the sentence, "The word 'I' is a pronoun." Furthermore, if the universal dictionary inciudes Japanese, then there would a plurality of entries for the personal pronoun "L"
Alternatively, there may be a plurality of entries in the universal dictionary for the word form. This state of affairs is due to the word form's being either grammatically or semantically ambiguous or both. In the example with TABLE 1. an ambiguity is inherent in any source word stream that includes the word " read" . because the verb "
read" might be transitive ('' to read a book" ) or intransitive (" to read. be literate" ), necessitating the translation of different Chinese words. yuelura and dushi~ respectively. To resolve this ambiguity, a word resolution module. which is an arrangement of instructions such as a subroutine or and ID package, is executed (step 327) to decide which among the various entries for the word "read" is best.
When executed. the ID package can perform any effectual technique for choosing one of the meanings of the facially ambiguous word form. including inspecting other words in the same content, use of statistical techniques. and even spawning an invocation of the word stream analyzer 202 and the mapping unit 204. If the ID package is unable to disambiguate the word form. then one of the entries will be picked as a default. In the exemplary sentence. the transitive meaning corresponding to the Chinese word ytrelun.
detectable on account of the direct object noun phrase "the book under the table'. would be selected. Accordingly. the grammatical form H 1 for transitive verb would be established for the word form (step 329).
DECOMPOSITION
Referring again to FIG. 3(a), step 330 is the next stage performed by the word stream analyzer 202. More specifically, step 330 iterates over the sequence of split rules 232. For each of the split rules 232. the record set is split according to the split rule if the split rule matches the word forms and grammatical forms in the record set (step 340). In the example. split rule 1100 of TABLE 4 matches against record set 400. because record set 400 is a simple sentence in subject-verb-object order. Accordingly. split rule 110 causes record set 400 to be partitioned, that is, disjointedly subdivided, into three data sets: record 1 S set 410 containing the pronoun '' I" and corresponding to a metanoun as subject in this epistemic instance. record set 412 with the word forms "have" and "read'' as the metaverb, and the direct object "the book on the table" in a record set 414 as the right metanoun. If adverbs were present. they would go in the metaverb record set 412.
After this split has occurred. the process is repeated by looping back to step 330. On a later iteration, split rule 1235 of TABLE 4 will eventually match record set 414 and cause the record set 414 to be decomposed into three data sets. The -1 record set is record set 420 and contains "the book" as the left metanoun. The metaverb is handled in the 0 record set =122 and includes the preposition "on,'~ and the right metanoun in the +1 record set 424 containing "the table." As also illustrated in FIG. 4(a), record set 420 and record set 424 are further decomposed into triplet 430. 432. and 424 and triplet 440, 442, and 444.
respectively. What is interesting about the metaverbs 432 and 442 is that they can be blank to express. for example, a definite article transforming with a noun. While the sample translation does not show the splitting of a metaverb. a metaverb too may be further decomposed. for example, if it contains an adverbial prepositional phrase.

In accordance with one embodiment. the split tree of record set 400 is implemented in a table on a relational database system. More specifically, each record set is identified by its own key in the initial data set. a level of decomposition, the particular split set (-l, 0, or +1 ), and the key of the word form to the universal dictionary 236. For example. TABLE 6 illustrates any exemplary split table corresponding to the split table of FIG.
4(a), as implemented by a relational database table, with the actual source word form substituted for the universal dictionary 236 key for clarity.

Language Key Level Set Word English 1 0 0 I

English 2 0 0 have English 3 0 0 read ~

English 4 0 0 the English 5 0 0 book ~

English 6 0 0 under English 7 0 0 the English 8 0 0 table English 1 1 -1 I

English 2 1 0 have English 3 I 0 read English 4 1 +1 the English 5 1 +1 book English 6 1 +1 under English 7 1 +1 the English 8 1 +1 table English 4 2 -1 the ~

English 5 2 -1 book ~

English 6 2 0 under i English 7 2 +1 the English 8 2 +1 table English 4 3 -1 the English 0 3 0 j -English 5 3 +1 ~ book English 7 3 -1 ~ the English 0 3 0 -English g ~ - [ ~-table -__ +~

For purpose of clarity, FIG. 4(a) and TABLE 6 illustrate a split table holding an I 0 actual source language word form, such as '' I" . in each record. However, in a preferred embodiment. the word form is replaced within the split tree by the appropriate key to the universal dictionary 234 entry, facilitating a direct access to the target language word form.

FIG. 4(b) depicts a more compact schematic of the same split tree, but with elimination of visually redundant elements.
MAPPING
Referring again to FIG. 3(a), the next step. which is performed by mapping unit 204.
is to apply the mapping rules (step 350). More specifically, step 350 iterates over the sequence of mapping rules 234. For each of the mapping rules 234. the record set is mapped according to the mapping rule if the mapping rule matches with any of the node labels, word forms, and grammatical forms in the record set (step 360).
Referring to FIG.
4(b) of the example, applying the mapping rule that deletes definite articles, as in TABL>r 3, causes two triplets to be transformed. In the transformation. record sets may be swapped, and words forms may be added, deleted, changed, and moved around within and between record sets. Furthermore, the grammatical forms of the source language is also transformed into the appropriate target language grammatical form as indicated by the mapping rule.
Mapping rules can be sequentially sequence similarly to the sequencing of the split rules so that they may execute in a predetermined order. Sequencing mapping rules help to alleviate the implementation of the universal translation system, for example, by placing swaps last.
First, the triplet whose metaverb is in record set 430, 432, and 434 is transformed to produce record sets 450. 4~2. and 454 as shown in FIG. 4(c). In this transformation, the definite article is deleted from record set 430 as shown in record set 450.
Furthermore, for purposes of illustration, the word from of transformed record set 454 is shown in its Chinese form, that is shfe. to indicate this record set has been mapped by a mapping rule, although the preferred form is a key into the appropriate universal dictionary entry.
Similarly, record sets 440. 442, and 444 is mapped to record sets 460, 462. and 464, wherein the definite article is deleted in record set 460 and the word form in record set 464 is shown herein for illustrative purposes as the Chinese word form ~huozi.
The next mapping rule that occurs in this example is transforming the prepositional phrase at metaverb record set 422 according to dictates of Chinese grammar.
With reference to FIG. 4(d), this mapping rule causes record sets 4~? and 462 to be swapped and the Chinese particle de to be added after the preposition shung (for "on" ).
In the relational database implementation. since the key values serve to order the word forms in the word stream. new keys values should be assigned to reflect that the former right-hand side (record set -1 ) now follows the former left-hand side (record set + 1 ). In one such implementation.
an extra column for the new key is added to the split table.
The next mapping rule. whose result is shown in FIG. 4(e), handles transitive verbs.
Although this mapping is largely a null transformation. the past tense sub-grammatical form is converted from "have read" to the Chinese "yW ~lan ... !e" in record set 482.
Furthermore. record set 480, corresponding to the "I" of the original record set 410, is displayed for purpose of illustration with the Chinese word form " wig ."
TARGET WORD STREAM RECONSTRUCTION
After all the mapping rules 234 have been applied in steps 3~0 and 360 of FIG.
3(a), the target word stream is reconstructed by the word stream construction module 206 from the transformed split tree. With continued reference to FIG. 3(a}, the nodes of the split tree, that is the record sets, are successively visited in step 370, preferably in a bottom-to-top, left-to-right order. At step 380, each metaverb record set 0 is converted into a record set that also includes the metanoun record sets -1 and +1.
Generally, while constructing, the target word forms will be arranged in a left-to-right order. but other orderings may be effectuated as appropriate and as recorded in the word construction table. Although FIGS. 4(a}-4(f) illustrate the split tree with a source or target language word form instead of a key into the universal dictionary 236, it is during this step that the universal dictionary 236 key is finally resolved into the target word form by a look up operation.
For the working example. the reconstruction of the target word stream is illustrated in FIG. 4(f). Traversing bottom-to-top and left-to-right, record sets 450, 452. and 454 are first visited. In the relational database implementation, these records can be chosen by selecting the records with the highest level number and the highest new key values. The combination record sets 4~0. 452, and 454. which is trivial, is reflect in record set 474 as "shu'". Similarly, the other level 3 record sets, namely record sets 460. 462.
and 464, are reconstructed into record set =170 as "~huo=i" .

At the next hiuher level. record sets 470. 472. and 474 are coalesced in the normal left-to-right order as "=lruozi shung do shin" in record set 484. This record set 48-1 is the Chinese target text that corresponds to the English source tent of "the book under the table." At a still higher level, record set 480 with '' iro", record set =I82 with "ytrvlan ...
Ic~", and record set 484 with "_huozi shan~,~ de shin" are combined together.
Therefore, the reconstructed final order shown in result set 490 is " vro yuelan ~huo=i shhng de shu le,"
because the particle Ic' belongs at the end of the clause. as the Chinese translation of the English clause " I have read the book under the table." Special word place rules may be handled by the mapping rules in one embodiment or by special reconstruction rules in an alternative embodiment.
Referring back to FIG. 3(a) the target word stream undergoes a post-processing step 390 by post-processor 220. Post-processor 220 performs analogous sensory medium conversions, similar but opposite of the conversions performed by pre-processor 210. For example, the result of post-processing in step 390 may be a digital signal representing text in the target language, an acoustic signal representing speech in the target language (as by a speech synthesizer), and an optical signal representing characters in the target language, for example, as displayed on a cathode-ray tube or printed out on a piece of paper.
EXTENSIONS AND APPLICATIONS
It is evident that the universal language translation model described herein is a very powerful mechanism for manipulating information in any form. Therefore, the present invention is not limited merely to translating from one human language to another. In fact, the principles discussed herein are applicable to transforming any source signal that embodies information. Information may be defined as a knowledge representation relating to any knowledge discipline that is ultimately useful to any sentient or conscious creature 2~ such as human being.
For example, information represented according to formatting conventions and languages such as HyperText Markup Language (HTML) can be usefully transformed.
Other examples of knowledge disciplines include engineered ssytems such as such as A/D
converters and satellite systems. mathematical theories and expressions such asy fl_r) and F=ma. and biological orders such as DNA replication. 1n addition, embodiments are applicabie to controlling dynamic systems such as factories, power plants. and financial institutions.
As another example. artificial languages such as computer source programming languages are fairly straightforward to translate. because programming languages are much less ambiguous than natural languages. Thus, by analyzing source code into a triplet internal representation or into an internal representation of epistemic instances, computer source can be translated into object code like a conventional compiler, into the source code of another high-level language such a FORTRAN to C++.
Similarly, object code, that is machine-executable instructions, embody information in the operations the computer will carry out. Thus, by properly setting up the universal dictionary 246. the mapping rules 234, and the split rules 232, object code can be transformed into knowledge embodiments according to other knowledge representations, for example, source code, object code for another processor, and even a natural, human language. In fact, since programmable digital logic devices such as gate arrays are described by a hardware description language, object code can be translated into the hardware description language, and then directly manufactured. Since some implementations are capable of translating natural and artificial languages into a object code and since object code can be translated into hardware. a mechanism is provided for building Application Specific Integrated Circuits (ASICs) from a functional source code program, an executable file. even a precise English language description. for example, "Make me an operational amplifier having a gain of ..." .
In another application, a universal machine translator can be integrated into telecommunications devices to produce a '' universal communicator." For example, either or both of the pre-processing unit ? 10 and post-processing unit 220 can be coupled into a wireless or wireline telephone system and network, a multimedia communications system, a modem. a facsimile machine or telecopier. a computer system and/or network. a pager, a radio and/or television system, a radar. sonar, infrared. and optical communications system.
a photocopy machine. a hand-held. lap-top. or body-worn communications device.
or another other machine communications device for machine and operation in any knowledge discipline.
While this invention has been described in connection with what is presently considered to be the most practical and preferred embodiment. it is to be understood that the invention is not limited to the disclosed embodiment. but on the contrary, is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.

Claims (32)

1. A method of translating a source signal embodying information according to a source language into a target signal embodying information according to a target language, said method comprising the steps of:
storing a plurality of related dictionary entries in a computer-readable medium, each of the related dictionary entries including a source word form according to the source language, a source grammatical form for the source word form, a corresponding target word form according to the target language, and a target grammatical form for the target word form;
analyzing the source signal to produce a first internal representation of epistemic instances of the information embodied in the source signal based on the related dictionary entries;
transforming the first internal representation to produce a second internal representation of epistemic instances according to the target language; and constructing the target signal based on the second internal representation and the related dictionary entries;
wherein the epistemic instances each consists of a pair of objects and a transformer of the pair of objects;
wherein the step of analyzing includes the steps of:
selecting a plurality of word forms embedded in the source signal;
determining respective grammatical forms for the selected word forms based on the dictionary; and decomposing the source signal into the first internal representation based on at least some of the word forms and the respective grammatical forms;
the method characterized in that the step of determining respective grammatical forms for the word forms includes the steps of:
accessing the dictionary to identify one or more entries in the dictionary corresponding to a selected word form from among the word forms embedded in the source signal;
establishing whether there is a plurality of entries in the dictionary corresponding to the selected word from;

if there is a plurality of entries in the dictionary corresponding to the selected word form, then executing an arrangement of instructions, associated with the selected word form, to select the source grammatical form in one of the plurality of entries based on other word forms embedded in the source signal; and if there is a single entry in the dictionary corresponding to the selected word form, then selecting the source grammatical form in the single entry.
2. The method of claim 1, wherein the step of analyzing includes the step of identifying special word formations.
3. The method of claim 1, wherein the step of determining respective grammatical forms for the word forms includes the steps of determining a part of speech and a sub-grammatical form.
4. The method of claim 1, further comprising the step of storing a sequence of decomposition rules in a computer-readable medium, each of the decomposition rules describing a condition under which the decomposition rule applies and how to partition a plurality of words forms in the source language into three data sets forming a triplet, under the condition the decomposition rule applies.
5. The method of claim 4, wherein the step of decomposing the source signal into the first internal representation based on at least some, of the word forms and the respective grammatical forms includes the step of:
evaluating whether the condition under which one of the decomposition rules applies to a plurality of the word forms based on at least some of the word forms and the respective grammatical forms; and if the condition under which one of the decomposition rules applies, then partitioning the plurality of the word forms into three data sets to form a triplet in accordance with said one of the decomposition rules.
6. The method of claim 4, wherein:
the first internal representation comprises a first hierarchical arrangement of triplets, including:
a top-level triplet partitioning word forms embedded in the source signal into three data sets thereof, and a lower-level triplet partitioning word forms embedded in a data set of a higher-level triplet into three data sets thereof; and the step of transforming the first internal representation to produce a second internal representation according to the target language includes the step of mapping at least some of the triplets in the first hierarchical arrangement to produce a second hierarchical arrangement of triplets according to the target language.
7. The method of claim 6, further comprising the step of storing a sequence of mapping rules in a computer-readable medium, each of the mapping rules describing a triplet according to the source language and a corresponding triplet according to the target language.
8. The method of claim 7, wherein the step of mapping at least some of the triplets in the first hierarchical arrangement to produce a second hierarchical arrangement of triplets according to the target language includes the step of mapping the at least some of the triplets in the first hierarchical arrangement to produce a second hierarchical arrangement of triplets based on the mapping rules.
9. The method of claim 8, wherein the step of mapping at least some of the triplets in the first hierarchical arrangement to produce a second hierarchical arrangement of triplets according to the target language includes the steps of:
accessing the sequence of mapping rules to find a source triplet in the first hierarchical arrangement of triplets that matches the triplet according to the source language of an accessed mapping rule;

generating a target triplet based on the corresponding triplet according to target language; and producing the second hierarchical arrangement based on the target triplet.
10. The method of claim 6, wherein:
the triplets include references to at least some of the dictionary entries;
and the step of constructing the target signal based on the second internal representation and the dictionary entries includes the steps of traversing the second hierarchical arrangement of triplets;
while traversing the second hierarchical, accessing the dictionary to identify target word forms based on the references to at least some of the dictionary entries.
11. The method of claim 1, wherein the source language includes a natural language.
12. The method of claim 11, wherein the source signal includes any of a source digital signal representing text in the source language, a source acoustic signal representing speech.
in the source language, and a source optical signal representing characters in the source language.
13. The method of claim 11, wherein the target language includes another natural language.
14. The method of claim 13, wherein the target signal includes any of a target digital signal representing text in the target language, a target acoustic signal representing speech in the target language, and a target optical signal representing characters in the target language.
15. The method of claim 1, wherein. the source language includes a computer language and the target language includes another computer language.
16. The method of claim 1, wherein the source language includes formatting conventions and the target language includes other formatting conventions.
17. A computer-readable medium bearing instructions for translating a source signal embodying information according to a source language into a target signal embodying information according to a target language, said instructions arranged, when executed, to cause one or more processors to perform the steps of:
storing a plurality of related dictionary entries in a second computer-readable medium, each of the related dictionary entries including a source word form according to the source language, a source grammatical form for the source word form, a corresponding target word form according to the target language, and a target grammatical form for the target word form;
analyzing the source signal to produce a first internal representation of epistemic instances of the information embodied in the source signal based on the related dictionary entries;
transforming the first internal representation to produce a second internal representation of epistemic instances according to the target language; and constructing the target signal based on the second internal representation and the related dictionary entries;
wherein the epistemic instances each consists of a pair of objects and a transformer of the pair of objects;
wherein the step of analyzing includes the steps of:
selecting a plurality of word forms embedded in the source signal;
determining respective grammatical forms for the selected word forms based on the dictionary; and decomposing the source signal into the first internal representation based on at least some of the word forms and the respective grammatical forms;
the computer-readable medium characterized in that the step of determining respective grammatical forms for the word forms includes the steps of:
accessing the dictionary to identify one or more entries in the dictionary corresponding to a selected word form from among the word forms embedded in the source signal;

establishing whether there is a plurality of entries in the dictionary corresponding to the selected word from;
if there is a plurality of entries in the dictionary corresponding to the selected word form, then executing an arrangement of instructions, associated with the selected word form, to select the source grammatical form in one of the plurality of entries based on other word forms embedded in the source signal; and if there is a single entry in the dictionary corresponding to the selected word form, then selecting the source grammatical form in the single entry.
18. The computer-readable medium of claim 17, wherein the step of analyzing includes the step of identifying special word formations.
19. The computer-readable medium of claim 17, wherein the step of determining respective grammatical forms for the word forms includes the steps of determining a part of speech and a sub-grammatical form.
20. The computer-readable medium of claim 17, wherein said instructions are further arranged to cause the one or more processors to perform the step of storing a sequence of decomposition rules in a third computer-readable medium, each of the decomposition rules describing a condition under which the decomposition rule applies and how to partition a plurality of words forms in the source language into three data sets forming a triplet, under the condition the decomposition rule applies.
21. The computer-readable medium of claim 20, wherein the step of decomposing the source signal into the first internal representation based on at least some of the word forms and the respective grammatical forms includes the step of:
evaluating whether the condition under which one of the decomposition rules applies to a plurality of the word forms based on at least some of the word forms and the respective grammatical forms; and 39~

if the condition under which one of the decomposition rules applies, then partitioning the plurality of the word forms into three data sets to form a triplet in accordance with said one of the decomposition rules.
22. The computer-readable medium of claim 20, wherein:
the first internal representation comprises a first hierarchical arrangement of triplets, including:
a top-level triplet partitioning word forms embedded in the source signal into three data sets thereof, and a lower-level triplet partitioning word forms embedded in a data set of a higher-level triplet into three data sets thereof; and the step of includes transforming the first internal representation to produce a second internal representation according to the target language the step of mapping at least some of the triplets in the first hierarchical arrangement to produce a second hierarchical arrangement of triplets according to the target language.
23. The computer-readable medium of claim 22, wherein said instructions are further arranged to cause the one or more processors to perform the step of storing a sequence of mapping rules in a computer-readable medium, each of the mapping rules describing a triplet according to the source language and a corresponding triplet according to the target language.
24. The computer-readable medium of claim 23, wherein the step of mapping at least some of the triplets in the first hierarchical arrangement to produce a second hierarchical arrangement of triplets according to the target language includes the step of mapping the at least some of the triplets in the first hierarchical arrangement to produce a second hierarchical arrangement of triplets based on the mapping rules.
25. The computer-readable medium of claim 24, wherein the step of mapping at least some of the triplets in the first hierarchical arrangement to produce a second hierarchical arrangement of triplets according to the target language includes the steps of:
accessing the sequence of mapping rules to find a source triplet in the first hierarchical arrangement of triplets that matches the triplet according to the source language of an accessed mapping rule;
generating a target triplet based on the corresponding triplet according to target language; and producing the second hierarchical arrangement based on the target triplet.
26. The computer-readable medium of claim 22, wherein:
the triplets include references to at least some of the dictionary entries;
and the step of constructing the target signal based on the second internal representation and the dictionary entries includes the steps of:
traversing the second hierarchical arrangement of triplets;
while traversing the second hierarchical, accessing the dictionary to identify target word forms based on the references to at least some of the dictionary entries.
27. The computer-readable medium of claim 17, wherein the source language includes a natural language.
28. The computer-readable medium of claim 27, wherein the source signal includes any of a source digital signal representing text in the source language, a source acoustic signal representing speech in the source language, and a source optical signal representing characters in the source language.
29. The computer-readable medium of claim 27, wherein the target language includes another natural language.
30. The computer-readable medium of claim 29, wherein the target signal includes any of a target digital signal representing text in the target language, a target acoustic signal representing speech in the target language, and a target optical signal representing characters in the target language.
31. The computer-readable medium of claim 17, wherein the source language includes a computer language and the target language includes another computer language.
32. The computer-readable medium of claim 17, wherein the source language includes formatting conventions and the target language includes other formatting conventions.
CA002351406A 1998-11-19 1999-11-19 Universal translation method Abandoned CA2351406A1 (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US09/195,040 1998-11-19
US09/195,040 US6233546B1 (en) 1998-11-19 1998-11-19 Method and system for machine translation using epistemic moments and stored dictionary entries
PCT/US1999/027364 WO2000029978A1 (en) 1998-11-19 1999-11-19 Universal translation method

Publications (1)

Publication Number Publication Date
CA2351406A1 true CA2351406A1 (en) 2000-05-25

Family

ID=22719843

Family Applications (1)

Application Number Title Priority Date Filing Date
CA002351406A Abandoned CA2351406A1 (en) 1998-11-19 1999-11-19 Universal translation method

Country Status (7)

Country Link
US (1) US6233546B1 (en)
EP (1) EP1131743B1 (en)
AT (1) ATE287108T1 (en)
AU (1) AU1735400A (en)
CA (1) CA2351406A1 (en)
DE (1) DE69923216D1 (en)
WO (1) WO2000029978A1 (en)

Families Citing this family (87)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6424969B1 (en) * 1999-07-20 2002-07-23 Inmentia, Inc. System and method for organizing data
WO2001039035A1 (en) * 1999-11-17 2001-05-31 United Nations Language translation system
WO2001039034A1 (en) * 1999-11-17 2001-05-31 United Nations System for creating expression in universal language, and recorded medium on which translation rules used for the system are recorded
US6556973B1 (en) * 2000-04-19 2003-04-29 Voxi Ab Conversion between data representation formats
US20020010928A1 (en) * 2000-04-24 2002-01-24 Ranjit Sahota Method and system for integrating internet advertising with television commercials
US8296792B2 (en) * 2000-04-24 2012-10-23 Tvworks, Llc Method and system to provide interactivity using an interactive channel bug
US7702995B2 (en) 2000-04-24 2010-04-20 TVWorks, LLC. Method and system for transforming content for execution on multiple platforms
US8936101B2 (en) 2008-07-17 2015-01-20 Halliburton Energy Services, Inc. Interventionless set packer and setting method for same
US9788058B2 (en) 2000-04-24 2017-10-10 Comcast Cable Communications Management, Llc Method and system for automatic insertion of interactive TV triggers into a broadcast data stream
US7437669B1 (en) * 2000-05-23 2008-10-14 International Business Machines Corporation Method and system for dynamic creation of mixed language hypertext markup language content through machine translation
KR20020045343A (en) * 2000-12-08 2002-06-19 오길록 Method of information generation and retrieval system based on a standardized Representation format of sentences structures and meanings
WO2002054279A1 (en) * 2001-01-04 2002-07-11 Agency For Science, Technology And Research Improved method of text similarity measurement
US6944619B2 (en) * 2001-04-12 2005-09-13 Primentia, Inc. System and method for organizing data
WO2002097727A1 (en) * 2001-05-28 2002-12-05 Zenya Koono Automatic knowledge creating method, automatic knowledge creating system, automatic knowledge creating program, automatic designing method and automatic designing system
FR2825496B1 (en) * 2001-06-01 2003-08-15 Synomia METHOD AND SYSTEM FOR BROAD SYNTAXIC ANALYSIS OF CORPUSES, ESPECIALLY SPECIALIZED CORPUSES
GB2376554B (en) * 2001-06-12 2005-01-05 Hewlett Packard Co Artificial language generation and evaluation
US8214196B2 (en) 2001-07-03 2012-07-03 University Of Southern California Syntax-based statistical translation model
US6701333B2 (en) * 2001-07-17 2004-03-02 Hewlett-Packard Development Company, L.P. Method of efficient migration from one categorization hierarchy to another hierarchy
JP2003140890A (en) * 2001-10-31 2003-05-16 Asgent Inc Method and device for creating setting information of electronic equipment, method for creating security policy, and related device
WO2004001623A2 (en) 2002-03-26 2003-12-31 University Of Southern California Constructing a translation lexicon from comparable, non-parallel corpora
US20030203370A1 (en) * 2002-04-30 2003-10-30 Zohar Yakhini Method and system for partitioning sets of sequence groups with respect to a set of subsequence groups, useful for designing polymorphism-based typing assays
US7054804B2 (en) * 2002-05-20 2006-05-30 International Buisness Machines Corporation Method and apparatus for performing real-time subtitles translation
US20040098250A1 (en) * 2002-11-19 2004-05-20 Gur Kimchi Semantic search system and method
US20040158561A1 (en) * 2003-02-04 2004-08-12 Gruenwald Bjorn J. System and method for translating languages using an intermediate content space
US7181396B2 (en) * 2003-03-24 2007-02-20 Sony Corporation System and method for speech recognition utilizing a merged dictionary
US8548794B2 (en) 2003-07-02 2013-10-01 University Of Southern California Statistical noun phrase translation
US7711545B2 (en) * 2003-07-02 2010-05-04 Language Weaver, Inc. Empirical methods for splitting compound words with application to machine translation
JP3962382B2 (en) * 2004-02-20 2007-08-22 インターナショナル・ビジネス・マシーンズ・コーポレーション Expression extraction device, expression extraction method, program, and recording medium
US8296127B2 (en) 2004-03-23 2012-10-23 University Of Southern California Discovery of parallel text portions in comparable collections of corpora and training using comparable texts
US8666725B2 (en) 2004-04-16 2014-03-04 University Of Southern California Selection and use of nonstatistical translation components in a statistical machine translation framework
US7580837B2 (en) 2004-08-12 2009-08-25 At&T Intellectual Property I, L.P. System and method for targeted tuning module of a speech recognition system
JP5452868B2 (en) 2004-10-12 2014-03-26 ユニヴァーシティー オブ サザン カリフォルニア Training for text-to-text applications that use string-to-tree conversion for training and decoding
US7242751B2 (en) 2004-12-06 2007-07-10 Sbc Knowledge Ventures, L.P. System and method for speech recognition-enabled automatic call routing
US7751551B2 (en) 2005-01-10 2010-07-06 At&T Intellectual Property I, L.P. System and method for speech-enabled call routing
JP2006268375A (en) * 2005-03-23 2006-10-05 Fuji Xerox Co Ltd Translation memory system
JP2006276918A (en) * 2005-03-25 2006-10-12 Fuji Xerox Co Ltd Translating device, translating method and program
US7548849B2 (en) * 2005-04-29 2009-06-16 Research In Motion Limited Method for generating text that meets specified characteristics in a handheld electronic device and a handheld electronic device incorporating the same
US7657020B2 (en) 2005-06-03 2010-02-02 At&T Intellectual Property I, Lp Call routing system and method of using the same
US8886517B2 (en) 2005-06-17 2014-11-11 Language Weaver, Inc. Trust scoring for language translation systems
US8676563B2 (en) 2009-10-01 2014-03-18 Language Weaver, Inc. Providing human-generated and machine-generated trusted translations
US10319252B2 (en) 2005-11-09 2019-06-11 Sdl Inc. Language capability assessment and training apparatus and techniques
US8943080B2 (en) 2006-04-07 2015-01-27 University Of Southern California Systems and methods for identifying parallel documents and sentence fragments in multilingual document collections
ITGE20060061A1 (en) * 2006-06-09 2007-12-10 Engineering & Security S R L VOICE TRANSLATION METHOD AND PORTABLE INTERACTIVE DEVICE FOR THE IMPLEMENTATION OF THIS METHOD.
US8886518B1 (en) 2006-08-07 2014-11-11 Language Weaver, Inc. System and method for capitalizing machine translated text
US9053090B2 (en) * 2006-10-10 2015-06-09 Abbyy Infopoisk Llc Translating texts between languages
US9235573B2 (en) 2006-10-10 2016-01-12 Abbyy Infopoisk Llc Universal difference measure
US9633005B2 (en) 2006-10-10 2017-04-25 Abbyy Infopoisk Llc Exhaustive automatic processing of textual information
US9495358B2 (en) 2006-10-10 2016-11-15 Abbyy Infopoisk Llc Cross-language text clustering
US8145473B2 (en) 2006-10-10 2012-03-27 Abbyy Software Ltd. Deep model statistics method for machine translation
US9471562B2 (en) 2006-10-10 2016-10-18 Abbyy Infopoisk Llc Method and system for analyzing and translating various languages with use of semantic hierarchy
US8195447B2 (en) 2006-10-10 2012-06-05 Abbyy Software Ltd. Translating sentences between languages using language-independent semantic structures and ratings of syntactic constructions
US8433556B2 (en) 2006-11-02 2013-04-30 University Of Southern California Semi-supervised training for statistical word alignment
US9122674B1 (en) 2006-12-15 2015-09-01 Language Weaver, Inc. Use of annotations in statistical machine translation
US8468149B1 (en) 2007-01-26 2013-06-18 Language Weaver, Inc. Multi-lingual online community
US8615389B1 (en) 2007-03-16 2013-12-24 Language Weaver, Inc. Generation and exploitation of an approximate language model
US8831928B2 (en) 2007-04-04 2014-09-09 Language Weaver, Inc. Customizable machine translation service
US8825466B1 (en) 2007-06-08 2014-09-02 Language Weaver, Inc. Modification of annotated bilingual segment pairs in syntax-based machine translation
CN101303692B (en) * 2008-06-19 2012-08-29 徐文和 All-purpose numeral semantic library for translation of mechanical language
JP2010055235A (en) * 2008-08-27 2010-03-11 Fujitsu Ltd Translation support program and system thereof
US20100082324A1 (en) * 2008-09-30 2010-04-01 Microsoft Corporation Replacing terms in machine translation
KR101045762B1 (en) * 2008-11-03 2011-07-01 한국과학기술원 Real-time semantic annotation device and method for generating natural language string input by user as semantic readable knowledge structure document in real time
US8990064B2 (en) 2009-07-28 2015-03-24 Language Weaver, Inc. Translating documents based on content
US8380486B2 (en) 2009-10-01 2013-02-19 Language Weaver, Inc. Providing machine-generated translations and corresponding trust levels
KR101301536B1 (en) * 2009-12-11 2013-09-04 한국전자통신연구원 Method and system for serving foreign language translation
US10417646B2 (en) 2010-03-09 2019-09-17 Sdl Inc. Predicting the cost associated with translating textual content
US11003838B2 (en) 2011-04-18 2021-05-11 Sdl Inc. Systems and methods for monitoring post translation editing
US8694303B2 (en) 2011-06-15 2014-04-08 Language Weaver, Inc. Systems and methods for tuning parameters in statistical machine translation
US8935719B2 (en) 2011-08-25 2015-01-13 Comcast Cable Communications, Llc Application triggering
US8886515B2 (en) 2011-10-19 2014-11-11 Language Weaver, Inc. Systems and methods for enhancing machine translation post edit review processes
US8942973B2 (en) 2012-03-09 2015-01-27 Language Weaver, Inc. Content page URL translation
US10261994B2 (en) 2012-05-25 2019-04-16 Sdl Inc. Method and system for automatic management of reputation of translators
US9152622B2 (en) 2012-11-26 2015-10-06 Language Weaver, Inc. Personalized machine translation via online adaptation
US9414114B2 (en) 2013-03-13 2016-08-09 Comcast Cable Holdings, Llc Selective interactivity
US9460088B1 (en) * 2013-05-31 2016-10-04 Google Inc. Written-domain language modeling with decomposition
US10025778B2 (en) * 2013-06-09 2018-07-17 Microsoft Technology Licensing, Llc Training markov random field-based translation models using gradient ascent
US9213694B2 (en) 2013-10-10 2015-12-15 Language Weaver, Inc. Efficient online domain adaptation
US9330331B2 (en) 2013-11-11 2016-05-03 Wipro Limited Systems and methods for offline character recognition
RU2592395C2 (en) 2013-12-19 2016-07-20 Общество с ограниченной ответственностью "Аби ИнфоПоиск" Resolution semantic ambiguity by statistical analysis
RU2586577C2 (en) 2014-01-15 2016-06-10 Общество с ограниченной ответственностью "Аби ИнфоПоиск" Filtering arcs parser graph
US11076205B2 (en) 2014-03-07 2021-07-27 Comcast Cable Communications, Llc Retrieving supplemental content
RU2596600C2 (en) 2014-09-02 2016-09-10 Общество с ограниченной ответственностью "Аби Девелопмент" Methods and systems for processing images of mathematical expressions
US9626358B2 (en) 2014-11-26 2017-04-18 Abbyy Infopoisk Llc Creating ontologies by analyzing natural language texts
CN106383818A (en) 2015-07-30 2017-02-08 阿里巴巴集团控股有限公司 Machine translation method and device
US10346547B2 (en) * 2016-12-05 2019-07-09 Integral Search International Limited Device for automatic computer translation of patent claims
US10318634B2 (en) * 2017-01-02 2019-06-11 International Business Machines Corporation Enhancing QA system cognition with improved lexical simplification using multilingual resources
US10318633B2 (en) * 2017-01-02 2019-06-11 International Business Machines Corporation Using multilingual lexical resources to improve lexical simplification
US20230120230A1 (en) * 2021-10-20 2023-04-20 Transfluent Oy Method and system for translating source text of first language to second language

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4286330A (en) 1976-04-07 1981-08-25 Isaacson Joel D Autonomic string-manipulation system
JPH04111121A (en) * 1990-08-31 1992-04-13 Fujitsu Ltd Dictionary producer by field classification and mechanical translator and mechanical translation system using the same
US5481700A (en) 1991-09-27 1996-01-02 The Mitre Corporation Apparatus for design of a multilevel secure database management system based on a multilevel logic programming system
US5632022A (en) 1991-11-13 1997-05-20 The United States Of America As Represented By The Administrator Of The National Aeronautics And Space Administration Encyclopedia of software components
US5495413A (en) * 1992-09-25 1996-02-27 Sharp Kabushiki Kaisha Translation machine having a function of deriving two or more syntaxes from one original sentence and giving precedence to a selected one of the syntaxes
JP2821840B2 (en) * 1993-04-28 1998-11-05 日本アイ・ビー・エム株式会社 Machine translation equipment
EP0672989A3 (en) * 1994-03-15 1998-10-28 Toppan Printing Co., Ltd. Machine translation system
JP3356536B2 (en) * 1994-04-13 2002-12-16 松下電器産業株式会社 Machine translation equipment
JPH09128396A (en) * 1995-11-06 1997-05-16 Hitachi Ltd Preparation method for bilingual dictionary
US6233545B1 (en) * 1997-05-01 2001-05-15 William E. Datig Universal machine translator of arbitrary languages utilizing epistemic moments

Also Published As

Publication number Publication date
ATE287108T1 (en) 2005-01-15
US6233546B1 (en) 2001-05-15
EP1131743B1 (en) 2005-01-12
DE69923216D1 (en) 2005-02-17
WO2000029978A1 (en) 2000-05-25
EP1131743A1 (en) 2001-09-12
AU1735400A (en) 2000-06-05

Similar Documents

Publication Publication Date Title
EP1131743B1 (en) Universal translation method
Brill et al. An overview of empirical natural language processing
Sornlertlamvanich et al. Building a Thai part-of-speech tagged corpus (ORCHID)
US9798720B2 (en) Hybrid machine translation
US6760695B1 (en) Automated natural language processing
Corston-Oliver et al. An overview of Amalgam: A machine-learned generation module
Boguslavsky et al. Creating a Universal Networking Language module within an advanced NLP system
WO1997040453A1 (en) Automated natural language processing
Anwar et al. Syntax analysis and machine translation of Bangla sentences
Mousa Natural language processing (nlp)
Alshawi et al. Towards a Dictionary Support Environment for Realtime Parsing
Sethi et al. Hybridization based machine translations for low-resource language with language divergence
Hutchins Research methods and system designs in machine translation: a ten-year review, 1984-1994
Winiwarter Learning transfer rules for machine translation from parallel corpora
Papageorgiou et al. Multi-level XML-based Corpus Annotation.
Vasconcellos et al. Terminology and machine translation
Mohamed Machine Translation of Noun Phrases from English to Arabic
Marsh et al. Transporting the Linguistic String Project system from a medical to a navy domain
Agiza et al. An English-to-Arabic Prototype Machine Translator for Statistical Sentences
Lehmann et al. Human language and computers
JP3236027B2 (en) Machine translation equipment
Winiwarter WETCAT-Web-enabled translation using corpus-based acquisition of transfer rules
Hadla MACHINE-TRANSLATION APPROACHES AND ITS CHALLENGES IN THE 21 ST CENTURY.
Lane et al. Towards a bidirectional machine translator generator for multilingual communication
Takeda et al. CRITAC—An experimental system for Japanese text proofreading

Legal Events

Date Code Title Description
EEER Examination request
FZDE Discontinued