CA2145668A1 - Text input transliteration system - Google Patents
Text input transliteration systemInfo
- Publication number
- CA2145668A1 CA2145668A1 CA002145668A CA2145668A CA2145668A1 CA 2145668 A1 CA2145668 A1 CA 2145668A1 CA 002145668 A CA002145668 A CA 002145668A CA 2145668 A CA2145668 A CA 2145668A CA 2145668 A1 CA2145668 A1 CA 2145668A1
- Authority
- CA
- Canada
- Prior art keywords
- text
- recited
- rule
- transliteration
- rules
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/12—Use of codes for handling textual entities
- G06F40/126—Character encoding
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/12—Use of codes for handling textual entities
- G06F40/126—Character encoding
- G06F40/129—Handling non-Latin characters, e.g. kana-to-kanji conversion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/12—Use of codes for handling textual entities
- G06F40/151—Transformation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/12—Use of codes for handling textual entities
- G06F40/151—Transformation
- G06F40/157—Transformation using dictionaries or tables
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/40—Processing or translation of natural language
- G06F40/53—Processing of non-Latin text
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/40—Processing or translation of natural language
- G06F40/55—Rule-based translation
Abstract
A method and system for providing text input translit-eration processing is disclosed. An innovative system and method for performing the transliteration is presented that per-forms processing on text as it is input to a computer.
Description
~WO 94/2S922 214 5 q ~ 8 PCT/US94/00081 TEXT INPUT TRANSLITERATION SYSTEM
COPYRIGHT NOTIFICATION
Portions of this patent application contain materials that are subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent file or records, but otherwisereserves all copyright rights whatsoever.
CROSS-REFERENCE TO RELATED PATENT APPLICATIONS
This patent application is related to the patent application entitled Text Transliteration System, by Mark Davis and Judy Lin, filed concurrently with thislS application, and assigned to Taligent, the disclosure of which is hereby incorporated by r~eference.
Field of the Invention This invention generally relates to improvements in computer ~y~L~ s and more particularly to intelligently transliterating text as it is input to a computer system.
Background of the Invention US patent 5,148,541 discloses a multilingual database ~y~lell~ including sortingdata using a master universal sort order for all languages. The database system can be searched and retrieved by a user whether or not that data is in the user's own language. The data to be stored in the database is first encoded according to a master (or universal) sort order.
US patent 4,734,036 discloses a method and device for leaming a language.
The patent discusses a teaching aid for reinforcing a student's ability to learn an unfamiliar language including an upper sheet (12) marked with symbolic indicia to be taught to the student and one or more base sheets (11), each marked with a different translated version of the indicia on the upper sheet. The indicia on each base sheet are marked in registry with the corresponding indicia on the upper sheet.
One edge of the base sheet is joined, temporarily or permanently, to a corresponding wo 94/25922 214 ~ ~ ~ 8 -2- PCT/US94/00081 ~
edge of the upper sheet to allow the upper sheet to be lifted up from the base sheet to briefly expose a corresponding translation, transliteration, interpretation, or paraphrase marked on the base sheet then lowered again so that reading of the upper sheet can be instantly resumed.
s US patent 4,547,765 discloses a method and circuit arrangement for transliteration of code words of a code having m-place code words into corresponding code words of a different code likewise having m-place code words,individual bits of the code word to be translitereated are forwarded during serial 10 input into a m-place shift register or during the serial output therefrom. These bits are forwarded non-negated or negated from register stage to register stage over a respective forwarding circuit depending upon the measure or criterion of coincidence or non-coincidence between the code word to be transliterated and the code words of the different code. This occurs in such manner that the traversinglS bits experience a respective negation in front of and after a register stage whose position within the shift register corresponds to the position of non-coinciding bits within the two code words.
Systems such as the Apple (~) Macintosh (~) or Microsoft (~ Windows (TM) 20 have dead keys which are employed to extend the range of the keyboard for accented characters. With this mechanism, a user can type a key (e.g. option-u for umlaut) which puts the keyboard into a special state, but does not generate a character or any other visible indication of what has occurred. When the user then types a base character--one that combines with the accent--then the keyboard generates the 25 resulting accented character, for example, typing option-u, e produces ë). However, this approach requires a user to be cognizant of particular special keys associa ted with a particular task.
Sllmm~ry of the Invention Accordingly, it is a primary objective of the present invention to provide a setof flexibly defined rules stored in data structures in a computer system to automatically apply user specified transliterations to text as it is input to a computer system.
Brief Description of the Drawings 2 21: 4 ~ 6 ~ ~ i ; PCT/US94/00081 Figure 1 is a block diagram of a personal computer system in accordance with a preferred embodiment;
Figure 2 is a flowchart of the logic used to transliterate text between offsets start and finish in accordance with a preferred embodiment;
Figure 3 is a flowchart of the logic used to identify a type-in starting point in accordance with a preferred embodiment;
Figure 4 is a flowchart of the logic used to provide type-in transliteration in accordance with a preferred embodiment;
Figure 5 is an illustration of a display in accordance with a preferred embodiment; and Figure 6 is an illustration of a transliteration operation as it would appear ona user's display in accordance with a preferred embodiment of the invention.
Detailed Description Of The Invention The invention is preferably practiced in the context of an operating system resident on a personal computer such as the IBM t~) PS/2 (~) or Apple ~) Macintosh (~
computer. A representative hardware environment is depicted in Figure 1, which 25 illustrates a typical hardware configuration of a workstation in accordance with the subject invention having a central processing unit 10, such as a conventional microprocessor, and a number of other units interconnected via a system bus 12.
The workstation shown in Figure 1 includes a Random Access Memory (RAM) 14, Read Only Memory (ROM) 16, an I/O adapter 18 for connecting peripheral devices 30 such as disk units 20 to the bus, a user interface adapter 22 for connecting a keyboard 24, a mouse 26, a speaker 28, a microphone 32, and/or other user interface devices such as a touch screen device (not shown) to the bus, a communication adapter 34for connecting the workstation to a data processing network and a display adapter 36 for connecting the bus to a display device 38. The workstation has resident thereon 35 an operating system such as the Apple System/7 ~) operating system.
KEYBOARD TRANSLITERATORS
wo 94/25922 21~ 8 PCT/US94/00081 ~
On the Apple Macintosh computer, dead keys are used to extend the range of the keyboard for accented characters. With this mechanism, a user can type a key(e.g. option-u for umlaut) which puts the keyboard into a special state, but does not generate a character or any other visible indication of what has occurred. When the 5 user then types a base character--one that combines with the accent--then the keyboard generates the resulting accented character (e.g. option-u, e produces ë).
Dead-Key Example Key Deadkey state Display b <none> b option-u <umlaut> b a <none> ba d <none> bad In a preferred embodiment of this invention, the modal mechanism is replaced by the use of transliterators. When an application inserts characters from a keyboard into some text, then it will call the list of transliterators associated with 20 that keyboard, and the input method for the keyboard. An accent transliterator can provide the same functionality as dead keys. When an accent is typed, it will becombined with the previous "base" character if possible. For example:
Key Pre-transliterate Display b b b a ba ba option-u ba ba d bad bad 21456~8-_5_ Transliterators also perform many other functions. For example, they can replace generic quotes (",') by righthand and lefthand quotes (",',',"). They can also be used to perform general script transcriptions for any cases where the transcription is simple and unambiguous, as when converting from romaji to katakana or hiragana for japanese, converting jamo (letter components) to hangul(letter syllables) for Korean, or converting Querty to Hebrew, etc. By convention, an apostrophe can be used to prevent characters from being transliterated together. For example, to form "ba d", one would type "ba' d".
INPUT TRANSLITERATION
Transliteration can also be used to phonetically convert between different languages. This feature is especially important for languages such as Japanese that use a Roman keyboard to key in text, which is then transcribed into native Japanese characters. These characters can also be converted back into Roman characters. Aparticular class of transliterations called input transliterations obey two requirements set forth below.
Uniqueness Transcription from native to foreign script is unambiguous. Two different native strings cannot correspond to the same foreign string. For example, if the native script distinguishes between a retroflex and a dental T, then a transliteration cannot map them onto the same symbol "t".
Completeness Transcription from native to foreign script, or from foreign to native is complete. Every sequence of native symbols will map onto some string of foreign symbols. Transcription from foreign to native script should be complete, but is not generally unambiguous. For example, when a Roman-to-Japanese transcription is used, "ra" and "la" map onto the same Japanese symbol.
A TTransliterator object is used to perform transliterations. Input transliterators are composed of a set of context-sensitive rules. These rules are designed to allow non-programmers to edit them reasonably for localization.
Examples of rules:
cho =>
wo 94/25922 214 ~ ~ ~ 8 PCT/US94/00081 t[t => -~to =>
Using these rules, chotto can be transliterated into~
Transliteration may be dependent not only on script but also on language. It is also inherently an n x n problem: expecting to transliterate from Russian to Hindi by using a series of Cyrillic-Roman and Roman-Hindi transliterations is doomed to failure, since the transcriptions used to represent non-Roman letters 10 will vary depending on the script being represented: in some cases th will represent the sound found in thick, while in others it is an aspirated t.
A preferred embodiment provides input transliteration from Roman to Japanese (Hiragana and Katakana), Russian, Greek, Arabic, Devanagari (Hindi), and 15 Hebrew. There is also a "Symbols" transliterator which allows a user to enter any Unicode symbol by name i.e.., "apple-logo", and transcribe it to the actual character, .
Transliterators can be chained. For example, one may wish to have a "smart-20 quote" transliterator be the first in chain, followed by an input transliterator. Thismechanism is managed by the TTypingConfiguration object.
TEXT TRANSFORMATION THROUGH TRANSLITERATION
Transliteration can also be used for language-specific processing such as converting text from upper to lower case and back, creating title text. For example, 25 "With the Important Words Capitalized"; title text requires use of a title-filter text service in order to eliminate articles. Otherwise, the titled text will appear as: "With The Important Words Capitalized."), and stripping diacritical marks. Note that these results are achieved by modifying the text objects directly. If the transformation is only desired for display, and should not effect the actual text 30 objects, there is an alternative method employed using metamorphosis tables in the font.
TRANSL11 ~RATOR DESKTOP OBJECTS
Transliterators are desktop objects. For example, a user would add a 35 transliterator to his typing configuration by dragging it from the desktop to a typing configuration encapsulator. TTransliteratorModel encapsulates, accesses and ~WO 94/25922 2 ~ ~ 5 6 6 ~ PCT/US94/00081 manages the transliterator data. TTransliteratorUserInterface is responsible for the icon and thumbnail presentations as well as the user interface to edit the data. A
TModelSurrogate is used as a stand-in for the model, and will typically be imbedded in the typing configuration model.
5 Programmatic access to all available transliterators:
TTransliterator::GetAvailableTransliterators(TCollection&) that will return a collection of TModelSurrogate objects.
Identifying transliterators: Transliterator objects are stored as TFiles. The Pluto attribute is used to identify a transliterator.
Transliteration Internals Background Rule-based transliteration is designed for relatively unsophisticated users and 1OCA1;7:~r~ to be able to create and modify. The rules are designed to be 15 straightforward, and to apply especially to the case of transliteration as the user types. Although designed to meet the specific needs of transcribing text betweendifferent scripts, either during user type-in or converting a range of text in adocument, transliteration uses a general-purpose design which is applicable to awide range of tasks.
A transliteration rule consists of two main parts: a source and a result. The source can be accompanied by two strings which specify the context in which the conversion is to be made. Every rule must have a source field, but the other fields may be empty. For example:
Simple Rule preceding source succeeding result context context c i s Notice above, a c is turned into an s, but only if it is followed by an i.
Variables can be used to have multiple matches for characters in the context. For example:
WO 94/25922 21 61~ 6 6 8 -8- PCTIUS94/00081 ~
Rule with Variables preceding sourcesllccee.~in~ result context context c -- s Simple Inclusive Variable Variable Me~ning eiy Ac is turned into an s, but only if it is followed by an e, i or y. There are also exclusive variables, which match if the text character is not present in the variable's contents. An exclusive variable, with empty contents, will match any character. In 5 normal operation, once a string has been replaced, none of the replacement characters are subsequently checked for matches. However, an additional rechecked result field can be specified that will be rechecked for matches and possibly modified. An example is presented below.
Rule with rechecked result preceding source succeeding result rechecked context context result k ~ t taa fi 10 In this case, the sequence kaa will be converted into ~;fi. In the Indic languages, this can be used to capture the interactions between consonants and vowels in a general way.
In the text, rules are written in the following format. The three matching fields are separated from the two resulting fields by an arrow, with the contexts and rechecked 15 result distinguished by strike-through from the adjacent fields.
Example~ ) ab;
Example: 3~ 0 ab;
~WO 94/25922 214 ~ PCT/US94/00081 A Transliteration is designed to hold two sets of rules, so that it can provide transliteration in both directions, such as from Katakana to Latin, and back.
Definitions The following definitions are used in describing the internal matching S process used in a preferred embodiment. There are two different ways to apply Transliteration: one is to a range of text, and the other is on type-in. The definitions apply to both cases.
A rule rj is made of preceding context, source, succeeding context, result and rechecked result. The preceding context, source, sllccee~ing context are referred to lO collectively as the source fields, and the result and rechecked result are referred to collectively as the result fields. In the following the length of each part is abbreviated by length(pc), length(s), etc.
A character ck in the text matches a character cr in the rule if and only if either Cr is not a variable and ck = Cr (this is strong, bitwise identity).
lS cr is an inclusive variable and ck e contents [cr,]
Cr is an exclusive variable and - [ck e cr], A rule matches the text within a range at offset i if and only if all of the following:
the length(pc) characters before i match the preceding context, the length(s) characters after i match the source, 20 the length(sc) characters, from i + length(s) on, match the succeeding context.
I a I b I c ¦ d ¦ e I f I g I h ~
The text set forth above is used in the following examples.
Example: bcd 6 z matches at offset 1; and bc~ 6 z matches at offset 2.
A rule matches the text within a range at offset i up to offset j if and only ifyou could add (zero or more) characters after j that would cause the rule to match at i.
Example: bcde 6 z matches at offset 2 up to offset 4.
WO 94/2~;922 2 ~ 4 ~ ~ ~ 8 PCT/US94/00081 ~
-1~
A rule spans offset i in the text if and only if there is some j < i such that the rule matches the text at j up to i.
Example: ~cdef O g~ matches at offset 2, and spans offsets l through 5.
Example: de~ O i spans offsets 4 and 5.
S (The text characters def match the initial part of the rule, so the offsets between those characters are spanned, even though the x doesn't match.) Basic Operation on Ranges With a range of text, only the characters within the range are matched or replaced; the characters outside of the match are completely ignored. The l0 Transliteration operation proceeds as follows. Iterate through the offsets in the range one by one. For each offset i, check through the list of rules in the transliteration for the matching rules at that offset. If there are no matching rules, then continue iterating. If there is more than one matching rule, then pick the best match as follows:
Figure 2 is a flowchart setting forth the detailed logic of transliterating textbetween offsets start and finish in accordance with a preferred embodiment of the invention. Processing commences at function block 200 where index i is initialized to point to the start of the text that is to be transliterated. A test is performed at decision block 210 to determine if the index i has surpassed the finish of the text. If 20 all the text has been processed, then processing is terminated at terminal 220. If the index i is greater than or equal to finish at decision block 210, then currentrule is zeroed, the index j is equated with the first rule found using text(i) as an index, and jmax is equated with the last rule found using text(i) as an index as shown in function block 230.
At decision block 240, a test is performed to determine if the index j has exceeded jmax. If so, then a test is performed at decision block 260 to determine if currentrule is equal to zero. If so, then at function block 270, characters are replaced as indicated and indexes are reset before passing control to decision block 240. If the index j has not exceeded jmax at decision block 240, then Rule(j) is compared to the character located at index i. If they match, then another test is performed at decision block 244 to determine if Rule(j) is better than currentRule. If so, then currentRule is equated to Rule(j) at function block 250, j is incremented at function block 246, WO 94/25922 21~ ~ ~ 6 8 PCT/US94/00081 and control is passed to decision block 240. If there is not a match at decision block 242, then j is incremented at function block 240 and control is passed to decision block 240.
In the processing described above, a match x is strictly better than a match y if 5 and only if either:
the preceding context of x is longer than the preceding context of y, and the source +
succeeding context of x is at least as long as the source + succeeding context of y the preceding context of x is at least as long as the preceding context of y, and the source + succeeding context of x is longer than the source + succeeding context of 10 y Of the matching rules, eliminate all those where there is another matching rule which is strictly better. Of the remainder, pick the first one. (Note that rules are inserted into a Transliteration in order, and that order may be significant).
Example: abc O p is better than ab O q; yab O r is also better than ab O q; however, 15 neither abc O p nor yab O r is better than the other.
In order to speed up matching of rules, the collection of rules is indexed by indexed by the first character in the source of each rule. This does not affect the ordering of the rules, since any two rules that do not share the same first character of the source will never match at the same time, and thus never conflict. Rules 20 with variables in the first position are resolved at the time that they are added to a Transliteration: that is, if the variable has n characters in its contents, then n different rules with the different first letters are added to the Transliteration. When processing forward (either ranges or type-in), the rules are looked up by this first character, then sequentially accessed.
Once a matching rule is identified, a replacement is performed. The source is replaced from the matching rule by the result fields (result + rechecked results).
This may change the length of the text, since it may be different in length than the source. Resume iterating, starting at the offset i + length(result). This will mean that the rechecked results may be matched and modified. (The plain result can match against the preceding context of another rule, but will not be matched against the source, and thus cannot be subsequently modified).
WO 94/25922 21~ ~ 6 ~ ~ -12- PCTIUS94/00081 Operating on Type-In When transliteration is applied to type-in, the operation is somewhat different. The goal is to produce the same results as would have occurred had the user typed in all of the text without translitëration, then converted it with the range S conversion above. In addition, text is converted as it is entered.
It is difficult to predict what characters will follow after an initial text entry, so the process of transliteration cannot be completed until the user has entered additional characters. Example: suppose that there are rules ph O j and p t~) p. If the user has just typed in a p, it is impossible to finalize a match because an ambiguity 10 between the two rules exists. When the user types a new character, that character may be modified, and preceding characters may also change, since a unique rule may be specified.
The other complication is that it is impossible to predict the starting point ofthe range, because the user may have just changed to the current transliteration, or 15 just clicked in a new location. So, the text is only converted that could not have been converted without additional characters. Also, if the user is inserting text, all characters after the insertion point are always ignored, so the operation alwaysbehaves as if it is at the end of the text.
Figure 3 is a flowchart setting forth the detailed logic of transliteration 20 processing between two portions of a text delimited by textStart and textFinished in accordance with a preferred embodiment of the invention. Processing commences at function block 300 where an index i is initialized, and imin is initialized. Then, at decision block 310, a test is performed to determine if i < imin. If not, then i+1 is returned at terminal block 320. If i is less than imin, then at function block 330, the 25 currentrule variable is set equal to zero, j is equated to the rule found using i as an index into text for finding the first rule, and jmax is equated to the rule found using i as an index into text for finding the lastrule. Then, at decision block 340, a test is performed to determine if j is greater than jmax. If so, then i+1 is returned atterminal block 320. If not, then a test is performed at decision block 350 to 30 determine if rule(j) matches from i up to start after counting characters. If so, then i is incremented and processing passes to decision block 310. If not, then, j is incremented and control is passed to decision block 340 for further processing.
The operation proceeds as follows. Suppose that a user types a key which causes n characters to be inserted at offset i. After inserting the characters at offset i, 35 check through all of the rules that span i. Find the smallest offset s that corresponds ~ 0 94/25922 2 ~ 4 ~ 6 ~ 8 PCT/US94/00081 to the start of the source of one of these spanning rules. (Indexing by first character of source does not help in this case, since a backward search is performed, and the current character may be any character in the source or succeeding context of one of these rules. To handle this, the transliteration stores a number maximumBackup, 5 which is the length of the longest (source + succeeding context -1). This is the furthest point back that a spanning rule could start at. The transliteration then searches from i- maximumBackup forward to find the first rule that spans the desired index, which determines the index s.) Figure 4 is a flowchart setting forth the detailed logic of type-in l0 transliteration in accordance with a preferred embodiment of the invention.
Processing commences at function block 400 where the type-in starting point is determined. Then, at decision block 410, a test is performed to determine if theindex i is greater than or equal to finish. If not, then processing is completed at terminal 412. If so, then at function block 414, currentrule is equated to zero, the 15 index j is equated to the first rule indexed by the index i, and jmax is set equal to the last rule indexed by the index i. Then, another test is performed at decision block 416 to determine if j is greater than jmax. If not, then a test is performed at decision block 414 to determine if Rule(j) spans start. If so, then processing is completed, and control is passed to terminal 412. If so, then another test is performed at decision 20 block 430 to determine if currentrule is equal to zero. If so, then the index i is incremented and control is passed to decision block 410. If not, then at function block 440, characters are replaced in accordance with the current rule, and processing is passed to decision block 410.
Once the smallest offset is identified, all the rules that match must be identified at s up to i + n. If any of these rules also span i + n, then no conversions are performed. Otherwise, identify the best match within the transliteration ranges, and perform the substitution. Then, reset s according to the result and repeat this processing until done.
I a I b I c I d ~ g I h I i I j I
In all the following examples, ef has been inserted at offset 4 in the above text.
Example: with the rules (cd O x; de O y), s = 3; so convert the de to y, and do not convert the cd-WO 94/25922 2 1 ~ 5 ~ ~ & PCT/US94/00081 ~
Example: with the rules (cdm O x; cd O y), s = 2; so convert the cd to y, and do not affect the e.
Example: with the rules ( ~bcdem e, x; ab O y; bc O z; bcm O w; de O v), s = 1; so convert the bc to z, and the de to v.
Human Interface An example of the user interface display used to create transliteration rules ispresented in Pigure 5. This Figure shows a transliteration display in the process of being created in accordance with a preferred embodiment of the invention. The rules are in the top box, with context variables in the center, and test samples at the lO bottom.
New rules can be added at the end, or inserted in the proper order. If rules do not conflict, or if any rule is inserted after a "better" rule, then it can be automatically reordered. For example, if the rule a O x exists, and the user adds the rule ab O y, then the inserted rule is reordered before the existing rule since 15 otherwise it would have no effect. New context variables can be added at the end, because order is unimportant. The test source is transliterated into the result whenever any change is made, so that the user can check the correctness of the actions.
Figure 6 is an illustration of a transliteration operation as it would appear ona user's display in accordance with a preferred embodiment of the invention. At label 610, a sentence is typed in with transliteration disabled. Label 680 shows the sentence appearing at label 610 after it has been selected and fully transliterated.
Labels 620 to 670 show what the user sees as she types in successive characters from label 610 with input transliteration enabled. At label 620, the user has entered the character "c". At label 630, the user has typed in "ch". At label 640, the user has typed "cho", and there is enough context to invoke the appropriate rule and transliterate the text input into "~> ~". A similar operation is performed on the additional examples appearing at labels 650 to 680.
2~45668 ~WO 94/25922 PCT/US94/00081 TRANSL~rERATOR CLASS AND METHOD DESCRIPTION
Class TTransliterator TTransliterator is an abstract base class that transforms text based upon some well~
defined rules or algorithm. It descends from TTextModifier, and can be used in 5 conjunction with a word processing engine for inline transliteration during typing.
Subclasses of TTransliterator can be used to perform intra-script transformationsuch as accent composition and upper or lower casing, or inter-script phonetic transcription. Inter-script transliteration provides an alternative way of entering non-Roman text. For example, instead of using a Hebrew keyboard, Hebrew text can10 be entered with an American keyboard using Roman to Hebrew transliteration.
This class is also used to aid typing by providing smart quotes and other punctuation modifiers. TTransliterator provides methods for translating text as well as reversing the effect. These methods must be overridden by the concrete subclasses. TTransliterator is designed to enable chaining multiple objects together.
15 One or more such objects can work within the editable text classes to allow for transliteration during typing.
The replacement text will also have the correct styles. E.g., if the source textis "aeiou", and the replacements are "Æ" for "a", "I" for "i", "O" for "o" and "U" for 20 "u", then the replacement text will be: "ÆIOU". If a replacement for some characters is a larger number of characters, then the last style will be extended. For example, if the source is "Why", and the replacements are "W for W", then the replacement text is: "VVhy". The "C" source code used to implement a preferred embodiment in accordance with the description above is set forth below.
Public Methods ** **************************************************
30 TTextModifier overrides ************************************************** *
virtual Boolean WantEvent(const TEvent& event) const=0;
/* Override to always return FALSE.*/
.
35 virtual Boolean ProcessEvent(const TEvent& event, const TModelCommand& currentCommand)=0;
/* Override to always return FALSE.*/
W O 94/2~922 2 1 4 ~ ~ ~ 8 PCTrUS94/00081 ~
virtual Boolean ProcessNewText(const TBaseText& newText, const unsigned long numDeletes, const TInsertionPoint& insertionRoint, TTexTTextRange& rangeToDelete TBaseText& textToInsert)=0;
/* Override to call Translate(). */
virtual void ProcessNewSelection(const TInsertionPoint& insertionPoint, unsigned long length)=0;
/* Override to do nothing. Transliterators don't care about new selection */
*********************************************************
*
These methods will directly translate(back the TBaseText argument.
Subclass: This method calls the appropriate Translate() method and should not be overridden.
*****************************************************************************
/
Boolean SimpleTranslate(TBaseText& text) const;
Boolean SimpleTranslateBack(TBaseText& text) const;
************************************************************
*
25 Translate() takes a sourceText, and a range within that text, and either modifies the text directly, or produces the replacement text and the range of characters to replace.
In the latter case, the replacement range is always a subset of the source range. It is up to the caller to call the appropriate text methods to actually substitute thereplacement. The actual transformation is defined by a concrete subclass.
*****************************************************************************
/
virtual Boolean Translate (const TBaseText& sourceText, const TTextRange& sourceRange, TBaseText& replacementText, TTexTTextRange&
replacemenTTextRange) const = 0;
virtual Boolean Translate (TBaseText& sourceText, ~WO 94/25922 21 4 ~ 6 6 8 PCT/US94/00081 const TTextRange& sourceRange, TTextRange&
replacemenTTextRange, unsigned long& numNewChars) 5 const = 0;
/* This method is similar to the one above, the difference being that given a text object and a range within it, this method will directly change the text object. */
10 /************
*
TranslateBack() takes the same arguments, but translates in the reverse direction, from target characters back to source characters using a translation mechanism defined by a concrete subclass. Note that these methods are not suitable for use with 15 keyboard entry *****************************************************************************
/
virtual Boolean TranslateBack (const TBaseText& sourceText, const TTextRange&
sourceRange, TBaseText& replacementText, TTextRange&
replacemenTTextRange) const = 0;
virtual Boolean TranslateBack (TBaseText& sourceText, const TTextRange&
sourceRange, TTextRange&
replacemenTTextRange, unsigned long&
numNewChars) const = 0;
Class TRuleBasedTransliterator TRuleBasedTransliterator is derived from TTransliterator. It uses a set of context- sensitive rules to transform text, and a parallel set to reverse this action.
These rules are designed such that a knowledgeable non-programmer can edit them WO 94/25922 2 1 4 ~ 6 ~ 8 PCT/US94/00081 ~
for localization. Roman rule-based transliteration is available for Japanese (kana), Hebrew, Arabic, Greek, Russian, and Devanagari.
TRuleBasedTransliterator also has the capability of specifying a range 5 variable. These variables can then be used in the rules to provide some simplepattern-matching functions. An example of the foregoing is presented below.
Public Methods /*$*********************************************************************
10 Constructor *************************************************** *
TRuleBasedTransliterator(const TFile& rulesFile);
/* Instantiate a transliterator based on the rules in a file. */
15 /**************
*
TTransliterator overrides. Translate() uses a set of context sensitive rules to perform the translation. See TTransliterateRule for more details. Rules are applied according to the principles of Partial and Multiple Replacements, as in the 20 discussion above *****************************************************************************
/
virtual Boolean Translate (const TBaseText& sourceText, const TTextRange& sourceRange, TBaseText& replacementText, TTexTTextRange&
replacemenTTextRange) const;
virtual Boolean Translate (TBaseText& sourceText, const TTextRange& sourceRange, TTextRange&
replacemenTTextRange, unsigned long& numNewChars) const;
* ******************
~I~WO 94/25922 214 ~ PCT/US94/00081 TranslateBack() uses a set of context sensitive rules to performa the transliteration.
takes the same arguments, but translates in the reverse direction, from target characters back to source characters. For example, with a Roman to Hebrew - transliterator, Translate will go from Roman to Hebrew; TransliterateBack will go 5 from Hebrew to Roman *****************************************************************************
/
virtual Boolean TranslateBack (const TBaseText& sourceText, const TTextRange&
10 sourceRange, TBaseText& replacementText, TTextRange&
replacemenTTextRange) const = 0;
15 virtual Boolean TranslateBack (TBaseText& sourceText, const TTextRange&
sourceRange, TTextRange&
replacemenTTextRange, unsigned long&
numNewChars) const = 0;
* ******************
25 Add and remove rules.
*****************************************************************************
/
virtual void AddTranslationRule (const TTransliterateRule& rule);
virtual void AddTranslationBackRule(const TTransliterateRule& rule);
30 virtual void RemoveTranslationRule (const TTransliterateRule& rule);
virtual void RemoveTranslationBackRule(const TTransliterateRule& rule);
*******************************************************
*
WO 94/25922 21~ PCT/US94/00081 Methods for range variables. These variables an be used for a limited degree of "wildcarding". Any character that is not in the source or target set can be designated to be a range variable. Effectively, that character will match against any of the characters in the specified range. Inverse ranges can also be specified: in that case, a 5 character matches against any character not in the specified range.
For example, if $ is defined as a range variable equal to "ei", then the two rules:
c[$>S
c>K
will cause "c" in front of "i" or "e" to be converted to "S", and in front of anything 10 else to be converted to "K".
*****************************************************************************
/
virtual void AddVariable (TransliterateVariable variableName, const TBaseText& variableValue, Boolean inverse = PALSE);
virtual Boolean IsVariable (TransliterateVariable ch) const;
/* Checks to see if a particular unicode character represents a variable. */
20 / * * * * * * * * *
*
Create iterators for the rules.
*****************************************************************************
/
25 virtual TIterator* CreateIterator () const;
virtual TIterator* CreateTranslateBackIterator () const;
Class TTransliterateRule TTransliterateRule implements a context sensitive transliteration rule. It is 30 used within a TRuleBasedTransliterator.
Public Methods ** ************************************************************
*
Construct a rule with four components. Note that these components cannot 35 exceed 256 charcters in length (This is an arbitrary limitation imposed by the desire to save as much space in rule storage as possible).
214~6~8 ~0 94/25922 PCT/US94/00081 *****************************************************************************
/
TTransliterateRule(const TBaseText& keyText, const TBaseText& resultText, const TBaseText* const preceedContext=NIL, const TBaseText* const succeedContext=NIL);
** **********************************************************
*
Use the rule described by the current object to perform translation. This method is called by TRuleBasedTransliterator::Translate() *****************************************************************************
virtual long Translate(TBaseText& sourceText, unsigned long&sourceOffset, unsigned long&numReplacedChars, unsigned long&numNewChars) const;
*******~****************************************************
*
20 Determines if the rule defined by this object applies to a given text object **********************************************************************~******
/
virtual Boolean DoesRuleApply(const TBaseText&text, const unsigned long 25 textOffset, const TDictionary&variableTable, Boolean&variableMatch, unsigned long&translatableChars) const;
Class TKoreanTransliterator TKoreanTransliterator implements Korean Jamo<->Hangul transliteration.
It is a subclass of TTransliterator, however, the algorithm used is completely 30 different than the rule-based transliteration implemented by TRuleBasedTransliterator. Background: Each Hangul character consists of two to three components, called Jamo's. The first and second components are mandatory; however, the third is optional. The transliterator assumes that all defined Jamo and Hangul characters can be in the source text, even though some 35 complex Jamos cannot be entered from certain keyboards. This means that another preprocessors is needed to compose the complex Jamos from the "type-WO 94/25922 214 ~ PCT/US94/00081 able" Jamo. The reason that this step is separated from Jamo-Hangul transliteration is that the Jamo-Jamo composition is highly dependent on the keyboard layout where the Jamo-Hangul process is not.
Class THexTransliterator THexTransliterator is derived from TTransliterator. It transforms hex numbers between one and four digits to their unicode representations. When used inline, THexTransliterator provides a simple input method for generating Unicodecharacters that cannot be entered with a keyboard.
Class TSystemTransliterator TSystemTransliterator is the user's interface to the system's transliterators. It can be used to get and set the current transliteration, and to query for the available transliterators on the system as well as perform all the public protocol of 15 TTranslitator. It is notified when a change occurs in the system transliterator.
Public Methods (TTransliterator overrides) ********************************************************
20 The following constants define the transliterators currently available in the system ********************************************************
static const TToken& kRomanTransliterator;
static const TToken& kJapaneseTransliterator;
25 static const TToken& kKatakanaTransliterator;
static const TToken& kArabicTransliterator;
static const TToken& kCapitalizeTransliterator;
static const TToken& kHebrewTransliterator;
static const TToken& kDevanagariTransliterator;
30 static const TToken& kRussianTransliterator;
static const TToken& kSymbolTransliterator;
static const TToken& kGreekTransliterator;
**************************************************************
35 *
Query for all available transliterators. Returns a TCollection of TTokens, WO 94125922 21~ ~ 6 6 8 PCTIUS94/00081 each being the name of a transliterator.
**********************~***********~******************~******~*************~**
/
virtual void AvailableTransliterators(TCollection& transliterators) const;
s /***********~***************~*~*****~****~**********************~*~*********
*
Gets and sets the ~ysl~lll transliterator by name. If SetSystemTransliterator() is called, the next call to any public method in this class will reflect the change.
10 *****************************************************************************
/
virtual void GetSystemTransliteratorName(TLocaleName& name) const;
virtual void SetSystemTransliterator(const TLocaleName& name);
EXAMPLES IN ACCORDANCE WITH
A PREFERRED EMBODIMENT
Creating a TRuleBasedTransliterator from a Text File The TRuleBasedTransliterator object can be created from a text file. The file contains an ordered sequence of definitions, range variables, forward rules and backward rules. The following example consists of fragments of a text file specification for Devanagari (Hindi). Comments are preceded with a hash mark (#).
The following are character identifier definitions. ($XXXX indicates a Unicode character.) Once an identifier definition has been processed, then any occurrence of that identifier within brackets will be replaced by the right-hand side. In these text file examples, the x) preceding context, y) source, z) succeeding context, a) result, b) rechecked result are represented as: x] y [z > a I b as discussed in more detail below.
For example:
ka=$915 kha=$916 ga=$917 gha=$918 nga=$919 virama=$94D
aa=$93E
WO 94125922 214 ~ 6 6 8 PCTIUS94/00081 i=$93F
ii=$940 u=$941 uu=$942 5 rh=$943 lh=$944 e=$947 ai=$948 o=$94B
10 au=$94C
The following are range variables:
&:~virama}{aa~ai}~au~{ii~{i~{uu}{u~{rrh~{rh~lh~e~o~
~:bcdfghjklmnpqrstvwxyz The following convert from Latin letters to Devanagari letters.
lS kh>~kha~ I ~virama~
k>~ka~ I ~virama~
q>{ka~ I {virama~
{virama~aa>{aa~
{virama~ai>{ai~
20 {virama~au>{au~
{virama~ii>{ii~
{virama~i>~i~
# otherwise convert independent forms when separated by ': k'ai ->
~ka}{virama~wai~
25 ~virama~'aa>~waa~
{virama~ 'ai>{wai~
~virama~ 'au> ~wau~
~virama~'ii>~wii~
# convert to independent forms at start of word or syllable:
30 # e.g. keai -> ~ka~e~wai~; k'ai -> ~ka~wai~; (ai) -> (~wai~) aa>~waa~
ai>~wai~
au>{wau~
ii>~wii~
35 i>~wi~
The following rules convert back from Devanagari letters to Latin letters. Note that a less-than sign is used for the backwards rules.
~WO 94/Z592~ 21 ~ !~ 6 ~ ~ PCT/US94/00081 # normal consonants ~kha} [&<kh {kha~<kha {ka}{virama}[{ha}<k' S {ka} [&<k {ka}<ka {gha} [&<gh {gha}<gha # dependent vowels (should never occur except following consonants) 10 {aa}<aa ~ai}<ai {au}<au {ii}<ii # independent vowels (when following consonants) 15 ~]{waa}<'aa a] {wai}<'ai a] {wau}<'au ~] {wii}<'ii Translating Text Using the System Transliterator void TranslateText(TBaseText& text) {
// Instantiates a TSystemTransliterator. Will perform transliteration // based on the object currently chosen for the system. The default / / is "Roman", which includes accent composition and smart quotes.
TSystemTransltierator transliterator;
TText replacementText;
TTextRange replacemenTTextRange;
if (transliterator.Translate(text, TTextRange(O, text.Length()), replacementText, replacemenTTextRange)) {
text.DeleteText(replacemenTTextRange.RangeBegin(), replacemenTTextRange.RangeLength());
text.InsertText(replacementText, text.Length());
}
}
Wo 94/25922 ~ 1 4 ~ ~ ~ 8 -2~ PCT/US94/00081 Creating and Using Pre-Defined Transliterator void UpperCase(TBaseText& text) {
// Instantiates the "Capitalize" transliterator. Note that S / / creating a transliterator is very expensive since it has / / to either build or stream in the table. Therefore it's / / best to create once only.
static TRuleBasedTransliterator upperCase(TSystemTransliterator::kCapitalize); TText replacementText;
TTextRange replacemenTTextRange;
if (upperCase.Translate(text, TTextRange(0, text.Length()), replacementText, replacemenTTextRange)) {
text.DeleteText(replacemenTTextRange.RangeBegin(), replacemenTTextRange.RangeLength());
text.InsertText(replacementText, text.Length());~
// to change text into lower case, call the TranslateBack() // method.
void ChangeSystemTransliterator(const TLocaleName& newTransliteratorName) {
// Instantiates a TSystemTransliterator.
TSystemTransliterator transliterator;
transliterator.SetCurrentTransliterator(newTransliteratorName);
While the invention has been described in terms of a preferred embodiment in a specific system environment, those skilled in the art recognize that the invention can be practiced, with modification, in other and different hardware and software environments within the spirit and scope of the appended claims.
COPYRIGHT NOTIFICATION
Portions of this patent application contain materials that are subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent file or records, but otherwisereserves all copyright rights whatsoever.
CROSS-REFERENCE TO RELATED PATENT APPLICATIONS
This patent application is related to the patent application entitled Text Transliteration System, by Mark Davis and Judy Lin, filed concurrently with thislS application, and assigned to Taligent, the disclosure of which is hereby incorporated by r~eference.
Field of the Invention This invention generally relates to improvements in computer ~y~L~ s and more particularly to intelligently transliterating text as it is input to a computer system.
Background of the Invention US patent 5,148,541 discloses a multilingual database ~y~lell~ including sortingdata using a master universal sort order for all languages. The database system can be searched and retrieved by a user whether or not that data is in the user's own language. The data to be stored in the database is first encoded according to a master (or universal) sort order.
US patent 4,734,036 discloses a method and device for leaming a language.
The patent discusses a teaching aid for reinforcing a student's ability to learn an unfamiliar language including an upper sheet (12) marked with symbolic indicia to be taught to the student and one or more base sheets (11), each marked with a different translated version of the indicia on the upper sheet. The indicia on each base sheet are marked in registry with the corresponding indicia on the upper sheet.
One edge of the base sheet is joined, temporarily or permanently, to a corresponding wo 94/25922 214 ~ ~ ~ 8 -2- PCT/US94/00081 ~
edge of the upper sheet to allow the upper sheet to be lifted up from the base sheet to briefly expose a corresponding translation, transliteration, interpretation, or paraphrase marked on the base sheet then lowered again so that reading of the upper sheet can be instantly resumed.
s US patent 4,547,765 discloses a method and circuit arrangement for transliteration of code words of a code having m-place code words into corresponding code words of a different code likewise having m-place code words,individual bits of the code word to be translitereated are forwarded during serial 10 input into a m-place shift register or during the serial output therefrom. These bits are forwarded non-negated or negated from register stage to register stage over a respective forwarding circuit depending upon the measure or criterion of coincidence or non-coincidence between the code word to be transliterated and the code words of the different code. This occurs in such manner that the traversinglS bits experience a respective negation in front of and after a register stage whose position within the shift register corresponds to the position of non-coinciding bits within the two code words.
Systems such as the Apple (~) Macintosh (~) or Microsoft (~ Windows (TM) 20 have dead keys which are employed to extend the range of the keyboard for accented characters. With this mechanism, a user can type a key (e.g. option-u for umlaut) which puts the keyboard into a special state, but does not generate a character or any other visible indication of what has occurred. When the user then types a base character--one that combines with the accent--then the keyboard generates the 25 resulting accented character, for example, typing option-u, e produces ë). However, this approach requires a user to be cognizant of particular special keys associa ted with a particular task.
Sllmm~ry of the Invention Accordingly, it is a primary objective of the present invention to provide a setof flexibly defined rules stored in data structures in a computer system to automatically apply user specified transliterations to text as it is input to a computer system.
Brief Description of the Drawings 2 21: 4 ~ 6 ~ ~ i ; PCT/US94/00081 Figure 1 is a block diagram of a personal computer system in accordance with a preferred embodiment;
Figure 2 is a flowchart of the logic used to transliterate text between offsets start and finish in accordance with a preferred embodiment;
Figure 3 is a flowchart of the logic used to identify a type-in starting point in accordance with a preferred embodiment;
Figure 4 is a flowchart of the logic used to provide type-in transliteration in accordance with a preferred embodiment;
Figure 5 is an illustration of a display in accordance with a preferred embodiment; and Figure 6 is an illustration of a transliteration operation as it would appear ona user's display in accordance with a preferred embodiment of the invention.
Detailed Description Of The Invention The invention is preferably practiced in the context of an operating system resident on a personal computer such as the IBM t~) PS/2 (~) or Apple ~) Macintosh (~
computer. A representative hardware environment is depicted in Figure 1, which 25 illustrates a typical hardware configuration of a workstation in accordance with the subject invention having a central processing unit 10, such as a conventional microprocessor, and a number of other units interconnected via a system bus 12.
The workstation shown in Figure 1 includes a Random Access Memory (RAM) 14, Read Only Memory (ROM) 16, an I/O adapter 18 for connecting peripheral devices 30 such as disk units 20 to the bus, a user interface adapter 22 for connecting a keyboard 24, a mouse 26, a speaker 28, a microphone 32, and/or other user interface devices such as a touch screen device (not shown) to the bus, a communication adapter 34for connecting the workstation to a data processing network and a display adapter 36 for connecting the bus to a display device 38. The workstation has resident thereon 35 an operating system such as the Apple System/7 ~) operating system.
KEYBOARD TRANSLITERATORS
wo 94/25922 21~ 8 PCT/US94/00081 ~
On the Apple Macintosh computer, dead keys are used to extend the range of the keyboard for accented characters. With this mechanism, a user can type a key(e.g. option-u for umlaut) which puts the keyboard into a special state, but does not generate a character or any other visible indication of what has occurred. When the 5 user then types a base character--one that combines with the accent--then the keyboard generates the resulting accented character (e.g. option-u, e produces ë).
Dead-Key Example Key Deadkey state Display b <none> b option-u <umlaut> b a <none> ba d <none> bad In a preferred embodiment of this invention, the modal mechanism is replaced by the use of transliterators. When an application inserts characters from a keyboard into some text, then it will call the list of transliterators associated with 20 that keyboard, and the input method for the keyboard. An accent transliterator can provide the same functionality as dead keys. When an accent is typed, it will becombined with the previous "base" character if possible. For example:
Key Pre-transliterate Display b b b a ba ba option-u ba ba d bad bad 21456~8-_5_ Transliterators also perform many other functions. For example, they can replace generic quotes (",') by righthand and lefthand quotes (",',',"). They can also be used to perform general script transcriptions for any cases where the transcription is simple and unambiguous, as when converting from romaji to katakana or hiragana for japanese, converting jamo (letter components) to hangul(letter syllables) for Korean, or converting Querty to Hebrew, etc. By convention, an apostrophe can be used to prevent characters from being transliterated together. For example, to form "ba d", one would type "ba' d".
INPUT TRANSLITERATION
Transliteration can also be used to phonetically convert between different languages. This feature is especially important for languages such as Japanese that use a Roman keyboard to key in text, which is then transcribed into native Japanese characters. These characters can also be converted back into Roman characters. Aparticular class of transliterations called input transliterations obey two requirements set forth below.
Uniqueness Transcription from native to foreign script is unambiguous. Two different native strings cannot correspond to the same foreign string. For example, if the native script distinguishes between a retroflex and a dental T, then a transliteration cannot map them onto the same symbol "t".
Completeness Transcription from native to foreign script, or from foreign to native is complete. Every sequence of native symbols will map onto some string of foreign symbols. Transcription from foreign to native script should be complete, but is not generally unambiguous. For example, when a Roman-to-Japanese transcription is used, "ra" and "la" map onto the same Japanese symbol.
A TTransliterator object is used to perform transliterations. Input transliterators are composed of a set of context-sensitive rules. These rules are designed to allow non-programmers to edit them reasonably for localization.
Examples of rules:
cho =>
wo 94/25922 214 ~ ~ ~ 8 PCT/US94/00081 t[t => -~to =>
Using these rules, chotto can be transliterated into~
Transliteration may be dependent not only on script but also on language. It is also inherently an n x n problem: expecting to transliterate from Russian to Hindi by using a series of Cyrillic-Roman and Roman-Hindi transliterations is doomed to failure, since the transcriptions used to represent non-Roman letters 10 will vary depending on the script being represented: in some cases th will represent the sound found in thick, while in others it is an aspirated t.
A preferred embodiment provides input transliteration from Roman to Japanese (Hiragana and Katakana), Russian, Greek, Arabic, Devanagari (Hindi), and 15 Hebrew. There is also a "Symbols" transliterator which allows a user to enter any Unicode symbol by name i.e.., "apple-logo", and transcribe it to the actual character, .
Transliterators can be chained. For example, one may wish to have a "smart-20 quote" transliterator be the first in chain, followed by an input transliterator. Thismechanism is managed by the TTypingConfiguration object.
TEXT TRANSFORMATION THROUGH TRANSLITERATION
Transliteration can also be used for language-specific processing such as converting text from upper to lower case and back, creating title text. For example, 25 "With the Important Words Capitalized"; title text requires use of a title-filter text service in order to eliminate articles. Otherwise, the titled text will appear as: "With The Important Words Capitalized."), and stripping diacritical marks. Note that these results are achieved by modifying the text objects directly. If the transformation is only desired for display, and should not effect the actual text 30 objects, there is an alternative method employed using metamorphosis tables in the font.
TRANSL11 ~RATOR DESKTOP OBJECTS
Transliterators are desktop objects. For example, a user would add a 35 transliterator to his typing configuration by dragging it from the desktop to a typing configuration encapsulator. TTransliteratorModel encapsulates, accesses and ~WO 94/25922 2 ~ ~ 5 6 6 ~ PCT/US94/00081 manages the transliterator data. TTransliteratorUserInterface is responsible for the icon and thumbnail presentations as well as the user interface to edit the data. A
TModelSurrogate is used as a stand-in for the model, and will typically be imbedded in the typing configuration model.
5 Programmatic access to all available transliterators:
TTransliterator::GetAvailableTransliterators(TCollection&) that will return a collection of TModelSurrogate objects.
Identifying transliterators: Transliterator objects are stored as TFiles. The Pluto attribute is used to identify a transliterator.
Transliteration Internals Background Rule-based transliteration is designed for relatively unsophisticated users and 1OCA1;7:~r~ to be able to create and modify. The rules are designed to be 15 straightforward, and to apply especially to the case of transliteration as the user types. Although designed to meet the specific needs of transcribing text betweendifferent scripts, either during user type-in or converting a range of text in adocument, transliteration uses a general-purpose design which is applicable to awide range of tasks.
A transliteration rule consists of two main parts: a source and a result. The source can be accompanied by two strings which specify the context in which the conversion is to be made. Every rule must have a source field, but the other fields may be empty. For example:
Simple Rule preceding source succeeding result context context c i s Notice above, a c is turned into an s, but only if it is followed by an i.
Variables can be used to have multiple matches for characters in the context. For example:
WO 94/25922 21 61~ 6 6 8 -8- PCTIUS94/00081 ~
Rule with Variables preceding sourcesllccee.~in~ result context context c -- s Simple Inclusive Variable Variable Me~ning eiy Ac is turned into an s, but only if it is followed by an e, i or y. There are also exclusive variables, which match if the text character is not present in the variable's contents. An exclusive variable, with empty contents, will match any character. In 5 normal operation, once a string has been replaced, none of the replacement characters are subsequently checked for matches. However, an additional rechecked result field can be specified that will be rechecked for matches and possibly modified. An example is presented below.
Rule with rechecked result preceding source succeeding result rechecked context context result k ~ t taa fi 10 In this case, the sequence kaa will be converted into ~;fi. In the Indic languages, this can be used to capture the interactions between consonants and vowels in a general way.
In the text, rules are written in the following format. The three matching fields are separated from the two resulting fields by an arrow, with the contexts and rechecked 15 result distinguished by strike-through from the adjacent fields.
Example~ ) ab;
Example: 3~ 0 ab;
~WO 94/25922 214 ~ PCT/US94/00081 A Transliteration is designed to hold two sets of rules, so that it can provide transliteration in both directions, such as from Katakana to Latin, and back.
Definitions The following definitions are used in describing the internal matching S process used in a preferred embodiment. There are two different ways to apply Transliteration: one is to a range of text, and the other is on type-in. The definitions apply to both cases.
A rule rj is made of preceding context, source, succeeding context, result and rechecked result. The preceding context, source, sllccee~ing context are referred to lO collectively as the source fields, and the result and rechecked result are referred to collectively as the result fields. In the following the length of each part is abbreviated by length(pc), length(s), etc.
A character ck in the text matches a character cr in the rule if and only if either Cr is not a variable and ck = Cr (this is strong, bitwise identity).
lS cr is an inclusive variable and ck e contents [cr,]
Cr is an exclusive variable and - [ck e cr], A rule matches the text within a range at offset i if and only if all of the following:
the length(pc) characters before i match the preceding context, the length(s) characters after i match the source, 20 the length(sc) characters, from i + length(s) on, match the succeeding context.
I a I b I c ¦ d ¦ e I f I g I h ~
The text set forth above is used in the following examples.
Example: bcd 6 z matches at offset 1; and bc~ 6 z matches at offset 2.
A rule matches the text within a range at offset i up to offset j if and only ifyou could add (zero or more) characters after j that would cause the rule to match at i.
Example: bcde 6 z matches at offset 2 up to offset 4.
WO 94/2~;922 2 ~ 4 ~ ~ ~ 8 PCT/US94/00081 ~
-1~
A rule spans offset i in the text if and only if there is some j < i such that the rule matches the text at j up to i.
Example: ~cdef O g~ matches at offset 2, and spans offsets l through 5.
Example: de~ O i spans offsets 4 and 5.
S (The text characters def match the initial part of the rule, so the offsets between those characters are spanned, even though the x doesn't match.) Basic Operation on Ranges With a range of text, only the characters within the range are matched or replaced; the characters outside of the match are completely ignored. The l0 Transliteration operation proceeds as follows. Iterate through the offsets in the range one by one. For each offset i, check through the list of rules in the transliteration for the matching rules at that offset. If there are no matching rules, then continue iterating. If there is more than one matching rule, then pick the best match as follows:
Figure 2 is a flowchart setting forth the detailed logic of transliterating textbetween offsets start and finish in accordance with a preferred embodiment of the invention. Processing commences at function block 200 where index i is initialized to point to the start of the text that is to be transliterated. A test is performed at decision block 210 to determine if the index i has surpassed the finish of the text. If 20 all the text has been processed, then processing is terminated at terminal 220. If the index i is greater than or equal to finish at decision block 210, then currentrule is zeroed, the index j is equated with the first rule found using text(i) as an index, and jmax is equated with the last rule found using text(i) as an index as shown in function block 230.
At decision block 240, a test is performed to determine if the index j has exceeded jmax. If so, then a test is performed at decision block 260 to determine if currentrule is equal to zero. If so, then at function block 270, characters are replaced as indicated and indexes are reset before passing control to decision block 240. If the index j has not exceeded jmax at decision block 240, then Rule(j) is compared to the character located at index i. If they match, then another test is performed at decision block 244 to determine if Rule(j) is better than currentRule. If so, then currentRule is equated to Rule(j) at function block 250, j is incremented at function block 246, WO 94/25922 21~ ~ ~ 6 8 PCT/US94/00081 and control is passed to decision block 240. If there is not a match at decision block 242, then j is incremented at function block 240 and control is passed to decision block 240.
In the processing described above, a match x is strictly better than a match y if 5 and only if either:
the preceding context of x is longer than the preceding context of y, and the source +
succeeding context of x is at least as long as the source + succeeding context of y the preceding context of x is at least as long as the preceding context of y, and the source + succeeding context of x is longer than the source + succeeding context of 10 y Of the matching rules, eliminate all those where there is another matching rule which is strictly better. Of the remainder, pick the first one. (Note that rules are inserted into a Transliteration in order, and that order may be significant).
Example: abc O p is better than ab O q; yab O r is also better than ab O q; however, 15 neither abc O p nor yab O r is better than the other.
In order to speed up matching of rules, the collection of rules is indexed by indexed by the first character in the source of each rule. This does not affect the ordering of the rules, since any two rules that do not share the same first character of the source will never match at the same time, and thus never conflict. Rules 20 with variables in the first position are resolved at the time that they are added to a Transliteration: that is, if the variable has n characters in its contents, then n different rules with the different first letters are added to the Transliteration. When processing forward (either ranges or type-in), the rules are looked up by this first character, then sequentially accessed.
Once a matching rule is identified, a replacement is performed. The source is replaced from the matching rule by the result fields (result + rechecked results).
This may change the length of the text, since it may be different in length than the source. Resume iterating, starting at the offset i + length(result). This will mean that the rechecked results may be matched and modified. (The plain result can match against the preceding context of another rule, but will not be matched against the source, and thus cannot be subsequently modified).
WO 94/25922 21~ ~ 6 ~ ~ -12- PCTIUS94/00081 Operating on Type-In When transliteration is applied to type-in, the operation is somewhat different. The goal is to produce the same results as would have occurred had the user typed in all of the text without translitëration, then converted it with the range S conversion above. In addition, text is converted as it is entered.
It is difficult to predict what characters will follow after an initial text entry, so the process of transliteration cannot be completed until the user has entered additional characters. Example: suppose that there are rules ph O j and p t~) p. If the user has just typed in a p, it is impossible to finalize a match because an ambiguity 10 between the two rules exists. When the user types a new character, that character may be modified, and preceding characters may also change, since a unique rule may be specified.
The other complication is that it is impossible to predict the starting point ofthe range, because the user may have just changed to the current transliteration, or 15 just clicked in a new location. So, the text is only converted that could not have been converted without additional characters. Also, if the user is inserting text, all characters after the insertion point are always ignored, so the operation alwaysbehaves as if it is at the end of the text.
Figure 3 is a flowchart setting forth the detailed logic of transliteration 20 processing between two portions of a text delimited by textStart and textFinished in accordance with a preferred embodiment of the invention. Processing commences at function block 300 where an index i is initialized, and imin is initialized. Then, at decision block 310, a test is performed to determine if i < imin. If not, then i+1 is returned at terminal block 320. If i is less than imin, then at function block 330, the 25 currentrule variable is set equal to zero, j is equated to the rule found using i as an index into text for finding the first rule, and jmax is equated to the rule found using i as an index into text for finding the lastrule. Then, at decision block 340, a test is performed to determine if j is greater than jmax. If so, then i+1 is returned atterminal block 320. If not, then a test is performed at decision block 350 to 30 determine if rule(j) matches from i up to start after counting characters. If so, then i is incremented and processing passes to decision block 310. If not, then, j is incremented and control is passed to decision block 340 for further processing.
The operation proceeds as follows. Suppose that a user types a key which causes n characters to be inserted at offset i. After inserting the characters at offset i, 35 check through all of the rules that span i. Find the smallest offset s that corresponds ~ 0 94/25922 2 ~ 4 ~ 6 ~ 8 PCT/US94/00081 to the start of the source of one of these spanning rules. (Indexing by first character of source does not help in this case, since a backward search is performed, and the current character may be any character in the source or succeeding context of one of these rules. To handle this, the transliteration stores a number maximumBackup, 5 which is the length of the longest (source + succeeding context -1). This is the furthest point back that a spanning rule could start at. The transliteration then searches from i- maximumBackup forward to find the first rule that spans the desired index, which determines the index s.) Figure 4 is a flowchart setting forth the detailed logic of type-in l0 transliteration in accordance with a preferred embodiment of the invention.
Processing commences at function block 400 where the type-in starting point is determined. Then, at decision block 410, a test is performed to determine if theindex i is greater than or equal to finish. If not, then processing is completed at terminal 412. If so, then at function block 414, currentrule is equated to zero, the 15 index j is equated to the first rule indexed by the index i, and jmax is set equal to the last rule indexed by the index i. Then, another test is performed at decision block 416 to determine if j is greater than jmax. If not, then a test is performed at decision block 414 to determine if Rule(j) spans start. If so, then processing is completed, and control is passed to terminal 412. If so, then another test is performed at decision 20 block 430 to determine if currentrule is equal to zero. If so, then the index i is incremented and control is passed to decision block 410. If not, then at function block 440, characters are replaced in accordance with the current rule, and processing is passed to decision block 410.
Once the smallest offset is identified, all the rules that match must be identified at s up to i + n. If any of these rules also span i + n, then no conversions are performed. Otherwise, identify the best match within the transliteration ranges, and perform the substitution. Then, reset s according to the result and repeat this processing until done.
I a I b I c I d ~ g I h I i I j I
In all the following examples, ef has been inserted at offset 4 in the above text.
Example: with the rules (cd O x; de O y), s = 3; so convert the de to y, and do not convert the cd-WO 94/25922 2 1 ~ 5 ~ ~ & PCT/US94/00081 ~
Example: with the rules (cdm O x; cd O y), s = 2; so convert the cd to y, and do not affect the e.
Example: with the rules ( ~bcdem e, x; ab O y; bc O z; bcm O w; de O v), s = 1; so convert the bc to z, and the de to v.
Human Interface An example of the user interface display used to create transliteration rules ispresented in Pigure 5. This Figure shows a transliteration display in the process of being created in accordance with a preferred embodiment of the invention. The rules are in the top box, with context variables in the center, and test samples at the lO bottom.
New rules can be added at the end, or inserted in the proper order. If rules do not conflict, or if any rule is inserted after a "better" rule, then it can be automatically reordered. For example, if the rule a O x exists, and the user adds the rule ab O y, then the inserted rule is reordered before the existing rule since 15 otherwise it would have no effect. New context variables can be added at the end, because order is unimportant. The test source is transliterated into the result whenever any change is made, so that the user can check the correctness of the actions.
Figure 6 is an illustration of a transliteration operation as it would appear ona user's display in accordance with a preferred embodiment of the invention. At label 610, a sentence is typed in with transliteration disabled. Label 680 shows the sentence appearing at label 610 after it has been selected and fully transliterated.
Labels 620 to 670 show what the user sees as she types in successive characters from label 610 with input transliteration enabled. At label 620, the user has entered the character "c". At label 630, the user has typed in "ch". At label 640, the user has typed "cho", and there is enough context to invoke the appropriate rule and transliterate the text input into "~> ~". A similar operation is performed on the additional examples appearing at labels 650 to 680.
2~45668 ~WO 94/25922 PCT/US94/00081 TRANSL~rERATOR CLASS AND METHOD DESCRIPTION
Class TTransliterator TTransliterator is an abstract base class that transforms text based upon some well~
defined rules or algorithm. It descends from TTextModifier, and can be used in 5 conjunction with a word processing engine for inline transliteration during typing.
Subclasses of TTransliterator can be used to perform intra-script transformationsuch as accent composition and upper or lower casing, or inter-script phonetic transcription. Inter-script transliteration provides an alternative way of entering non-Roman text. For example, instead of using a Hebrew keyboard, Hebrew text can10 be entered with an American keyboard using Roman to Hebrew transliteration.
This class is also used to aid typing by providing smart quotes and other punctuation modifiers. TTransliterator provides methods for translating text as well as reversing the effect. These methods must be overridden by the concrete subclasses. TTransliterator is designed to enable chaining multiple objects together.
15 One or more such objects can work within the editable text classes to allow for transliteration during typing.
The replacement text will also have the correct styles. E.g., if the source textis "aeiou", and the replacements are "Æ" for "a", "I" for "i", "O" for "o" and "U" for 20 "u", then the replacement text will be: "ÆIOU". If a replacement for some characters is a larger number of characters, then the last style will be extended. For example, if the source is "Why", and the replacements are "W for W", then the replacement text is: "VVhy". The "C" source code used to implement a preferred embodiment in accordance with the description above is set forth below.
Public Methods ** **************************************************
30 TTextModifier overrides ************************************************** *
virtual Boolean WantEvent(const TEvent& event) const=0;
/* Override to always return FALSE.*/
.
35 virtual Boolean ProcessEvent(const TEvent& event, const TModelCommand& currentCommand)=0;
/* Override to always return FALSE.*/
W O 94/2~922 2 1 4 ~ ~ ~ 8 PCTrUS94/00081 ~
virtual Boolean ProcessNewText(const TBaseText& newText, const unsigned long numDeletes, const TInsertionPoint& insertionRoint, TTexTTextRange& rangeToDelete TBaseText& textToInsert)=0;
/* Override to call Translate(). */
virtual void ProcessNewSelection(const TInsertionPoint& insertionPoint, unsigned long length)=0;
/* Override to do nothing. Transliterators don't care about new selection */
*********************************************************
*
These methods will directly translate(back the TBaseText argument.
Subclass: This method calls the appropriate Translate() method and should not be overridden.
*****************************************************************************
/
Boolean SimpleTranslate(TBaseText& text) const;
Boolean SimpleTranslateBack(TBaseText& text) const;
************************************************************
*
25 Translate() takes a sourceText, and a range within that text, and either modifies the text directly, or produces the replacement text and the range of characters to replace.
In the latter case, the replacement range is always a subset of the source range. It is up to the caller to call the appropriate text methods to actually substitute thereplacement. The actual transformation is defined by a concrete subclass.
*****************************************************************************
/
virtual Boolean Translate (const TBaseText& sourceText, const TTextRange& sourceRange, TBaseText& replacementText, TTexTTextRange&
replacemenTTextRange) const = 0;
virtual Boolean Translate (TBaseText& sourceText, ~WO 94/25922 21 4 ~ 6 6 8 PCT/US94/00081 const TTextRange& sourceRange, TTextRange&
replacemenTTextRange, unsigned long& numNewChars) 5 const = 0;
/* This method is similar to the one above, the difference being that given a text object and a range within it, this method will directly change the text object. */
10 /************
*
TranslateBack() takes the same arguments, but translates in the reverse direction, from target characters back to source characters using a translation mechanism defined by a concrete subclass. Note that these methods are not suitable for use with 15 keyboard entry *****************************************************************************
/
virtual Boolean TranslateBack (const TBaseText& sourceText, const TTextRange&
sourceRange, TBaseText& replacementText, TTextRange&
replacemenTTextRange) const = 0;
virtual Boolean TranslateBack (TBaseText& sourceText, const TTextRange&
sourceRange, TTextRange&
replacemenTTextRange, unsigned long&
numNewChars) const = 0;
Class TRuleBasedTransliterator TRuleBasedTransliterator is derived from TTransliterator. It uses a set of context- sensitive rules to transform text, and a parallel set to reverse this action.
These rules are designed such that a knowledgeable non-programmer can edit them WO 94/25922 2 1 4 ~ 6 ~ 8 PCT/US94/00081 ~
for localization. Roman rule-based transliteration is available for Japanese (kana), Hebrew, Arabic, Greek, Russian, and Devanagari.
TRuleBasedTransliterator also has the capability of specifying a range 5 variable. These variables can then be used in the rules to provide some simplepattern-matching functions. An example of the foregoing is presented below.
Public Methods /*$*********************************************************************
10 Constructor *************************************************** *
TRuleBasedTransliterator(const TFile& rulesFile);
/* Instantiate a transliterator based on the rules in a file. */
15 /**************
*
TTransliterator overrides. Translate() uses a set of context sensitive rules to perform the translation. See TTransliterateRule for more details. Rules are applied according to the principles of Partial and Multiple Replacements, as in the 20 discussion above *****************************************************************************
/
virtual Boolean Translate (const TBaseText& sourceText, const TTextRange& sourceRange, TBaseText& replacementText, TTexTTextRange&
replacemenTTextRange) const;
virtual Boolean Translate (TBaseText& sourceText, const TTextRange& sourceRange, TTextRange&
replacemenTTextRange, unsigned long& numNewChars) const;
* ******************
~I~WO 94/25922 214 ~ PCT/US94/00081 TranslateBack() uses a set of context sensitive rules to performa the transliteration.
takes the same arguments, but translates in the reverse direction, from target characters back to source characters. For example, with a Roman to Hebrew - transliterator, Translate will go from Roman to Hebrew; TransliterateBack will go 5 from Hebrew to Roman *****************************************************************************
/
virtual Boolean TranslateBack (const TBaseText& sourceText, const TTextRange&
10 sourceRange, TBaseText& replacementText, TTextRange&
replacemenTTextRange) const = 0;
15 virtual Boolean TranslateBack (TBaseText& sourceText, const TTextRange&
sourceRange, TTextRange&
replacemenTTextRange, unsigned long&
numNewChars) const = 0;
* ******************
25 Add and remove rules.
*****************************************************************************
/
virtual void AddTranslationRule (const TTransliterateRule& rule);
virtual void AddTranslationBackRule(const TTransliterateRule& rule);
30 virtual void RemoveTranslationRule (const TTransliterateRule& rule);
virtual void RemoveTranslationBackRule(const TTransliterateRule& rule);
*******************************************************
*
WO 94/25922 21~ PCT/US94/00081 Methods for range variables. These variables an be used for a limited degree of "wildcarding". Any character that is not in the source or target set can be designated to be a range variable. Effectively, that character will match against any of the characters in the specified range. Inverse ranges can also be specified: in that case, a 5 character matches against any character not in the specified range.
For example, if $ is defined as a range variable equal to "ei", then the two rules:
c[$>S
c>K
will cause "c" in front of "i" or "e" to be converted to "S", and in front of anything 10 else to be converted to "K".
*****************************************************************************
/
virtual void AddVariable (TransliterateVariable variableName, const TBaseText& variableValue, Boolean inverse = PALSE);
virtual Boolean IsVariable (TransliterateVariable ch) const;
/* Checks to see if a particular unicode character represents a variable. */
20 / * * * * * * * * *
*
Create iterators for the rules.
*****************************************************************************
/
25 virtual TIterator* CreateIterator () const;
virtual TIterator* CreateTranslateBackIterator () const;
Class TTransliterateRule TTransliterateRule implements a context sensitive transliteration rule. It is 30 used within a TRuleBasedTransliterator.
Public Methods ** ************************************************************
*
Construct a rule with four components. Note that these components cannot 35 exceed 256 charcters in length (This is an arbitrary limitation imposed by the desire to save as much space in rule storage as possible).
214~6~8 ~0 94/25922 PCT/US94/00081 *****************************************************************************
/
TTransliterateRule(const TBaseText& keyText, const TBaseText& resultText, const TBaseText* const preceedContext=NIL, const TBaseText* const succeedContext=NIL);
** **********************************************************
*
Use the rule described by the current object to perform translation. This method is called by TRuleBasedTransliterator::Translate() *****************************************************************************
virtual long Translate(TBaseText& sourceText, unsigned long&sourceOffset, unsigned long&numReplacedChars, unsigned long&numNewChars) const;
*******~****************************************************
*
20 Determines if the rule defined by this object applies to a given text object **********************************************************************~******
/
virtual Boolean DoesRuleApply(const TBaseText&text, const unsigned long 25 textOffset, const TDictionary&variableTable, Boolean&variableMatch, unsigned long&translatableChars) const;
Class TKoreanTransliterator TKoreanTransliterator implements Korean Jamo<->Hangul transliteration.
It is a subclass of TTransliterator, however, the algorithm used is completely 30 different than the rule-based transliteration implemented by TRuleBasedTransliterator. Background: Each Hangul character consists of two to three components, called Jamo's. The first and second components are mandatory; however, the third is optional. The transliterator assumes that all defined Jamo and Hangul characters can be in the source text, even though some 35 complex Jamos cannot be entered from certain keyboards. This means that another preprocessors is needed to compose the complex Jamos from the "type-WO 94/25922 214 ~ PCT/US94/00081 able" Jamo. The reason that this step is separated from Jamo-Hangul transliteration is that the Jamo-Jamo composition is highly dependent on the keyboard layout where the Jamo-Hangul process is not.
Class THexTransliterator THexTransliterator is derived from TTransliterator. It transforms hex numbers between one and four digits to their unicode representations. When used inline, THexTransliterator provides a simple input method for generating Unicodecharacters that cannot be entered with a keyboard.
Class TSystemTransliterator TSystemTransliterator is the user's interface to the system's transliterators. It can be used to get and set the current transliteration, and to query for the available transliterators on the system as well as perform all the public protocol of 15 TTranslitator. It is notified when a change occurs in the system transliterator.
Public Methods (TTransliterator overrides) ********************************************************
20 The following constants define the transliterators currently available in the system ********************************************************
static const TToken& kRomanTransliterator;
static const TToken& kJapaneseTransliterator;
25 static const TToken& kKatakanaTransliterator;
static const TToken& kArabicTransliterator;
static const TToken& kCapitalizeTransliterator;
static const TToken& kHebrewTransliterator;
static const TToken& kDevanagariTransliterator;
30 static const TToken& kRussianTransliterator;
static const TToken& kSymbolTransliterator;
static const TToken& kGreekTransliterator;
**************************************************************
35 *
Query for all available transliterators. Returns a TCollection of TTokens, WO 94125922 21~ ~ 6 6 8 PCTIUS94/00081 each being the name of a transliterator.
**********************~***********~******************~******~*************~**
/
virtual void AvailableTransliterators(TCollection& transliterators) const;
s /***********~***************~*~*****~****~**********************~*~*********
*
Gets and sets the ~ysl~lll transliterator by name. If SetSystemTransliterator() is called, the next call to any public method in this class will reflect the change.
10 *****************************************************************************
/
virtual void GetSystemTransliteratorName(TLocaleName& name) const;
virtual void SetSystemTransliterator(const TLocaleName& name);
EXAMPLES IN ACCORDANCE WITH
A PREFERRED EMBODIMENT
Creating a TRuleBasedTransliterator from a Text File The TRuleBasedTransliterator object can be created from a text file. The file contains an ordered sequence of definitions, range variables, forward rules and backward rules. The following example consists of fragments of a text file specification for Devanagari (Hindi). Comments are preceded with a hash mark (#).
The following are character identifier definitions. ($XXXX indicates a Unicode character.) Once an identifier definition has been processed, then any occurrence of that identifier within brackets will be replaced by the right-hand side. In these text file examples, the x) preceding context, y) source, z) succeeding context, a) result, b) rechecked result are represented as: x] y [z > a I b as discussed in more detail below.
For example:
ka=$915 kha=$916 ga=$917 gha=$918 nga=$919 virama=$94D
aa=$93E
WO 94125922 214 ~ 6 6 8 PCTIUS94/00081 i=$93F
ii=$940 u=$941 uu=$942 5 rh=$943 lh=$944 e=$947 ai=$948 o=$94B
10 au=$94C
The following are range variables:
&:~virama}{aa~ai}~au~{ii~{i~{uu}{u~{rrh~{rh~lh~e~o~
~:bcdfghjklmnpqrstvwxyz The following convert from Latin letters to Devanagari letters.
lS kh>~kha~ I ~virama~
k>~ka~ I ~virama~
q>{ka~ I {virama~
{virama~aa>{aa~
{virama~ai>{ai~
20 {virama~au>{au~
{virama~ii>{ii~
{virama~i>~i~
# otherwise convert independent forms when separated by ': k'ai ->
~ka}{virama~wai~
25 ~virama~'aa>~waa~
{virama~ 'ai>{wai~
~virama~ 'au> ~wau~
~virama~'ii>~wii~
# convert to independent forms at start of word or syllable:
30 # e.g. keai -> ~ka~e~wai~; k'ai -> ~ka~wai~; (ai) -> (~wai~) aa>~waa~
ai>~wai~
au>{wau~
ii>~wii~
35 i>~wi~
The following rules convert back from Devanagari letters to Latin letters. Note that a less-than sign is used for the backwards rules.
~WO 94/Z592~ 21 ~ !~ 6 ~ ~ PCT/US94/00081 # normal consonants ~kha} [&<kh {kha~<kha {ka}{virama}[{ha}<k' S {ka} [&<k {ka}<ka {gha} [&<gh {gha}<gha # dependent vowels (should never occur except following consonants) 10 {aa}<aa ~ai}<ai {au}<au {ii}<ii # independent vowels (when following consonants) 15 ~]{waa}<'aa a] {wai}<'ai a] {wau}<'au ~] {wii}<'ii Translating Text Using the System Transliterator void TranslateText(TBaseText& text) {
// Instantiates a TSystemTransliterator. Will perform transliteration // based on the object currently chosen for the system. The default / / is "Roman", which includes accent composition and smart quotes.
TSystemTransltierator transliterator;
TText replacementText;
TTextRange replacemenTTextRange;
if (transliterator.Translate(text, TTextRange(O, text.Length()), replacementText, replacemenTTextRange)) {
text.DeleteText(replacemenTTextRange.RangeBegin(), replacemenTTextRange.RangeLength());
text.InsertText(replacementText, text.Length());
}
}
Wo 94/25922 ~ 1 4 ~ ~ ~ 8 -2~ PCT/US94/00081 Creating and Using Pre-Defined Transliterator void UpperCase(TBaseText& text) {
// Instantiates the "Capitalize" transliterator. Note that S / / creating a transliterator is very expensive since it has / / to either build or stream in the table. Therefore it's / / best to create once only.
static TRuleBasedTransliterator upperCase(TSystemTransliterator::kCapitalize); TText replacementText;
TTextRange replacemenTTextRange;
if (upperCase.Translate(text, TTextRange(0, text.Length()), replacementText, replacemenTTextRange)) {
text.DeleteText(replacemenTTextRange.RangeBegin(), replacemenTTextRange.RangeLength());
text.InsertText(replacementText, text.Length());~
// to change text into lower case, call the TranslateBack() // method.
void ChangeSystemTransliterator(const TLocaleName& newTransliteratorName) {
// Instantiates a TSystemTransliterator.
TSystemTransliterator transliterator;
transliterator.SetCurrentTransliterator(newTransliteratorName);
While the invention has been described in terms of a preferred embodiment in a specific system environment, those skilled in the art recognize that the invention can be practiced, with modification, in other and different hardware and software environments within the spirit and scope of the appended claims.
Claims (26)
1. A computer system for transliterating text as it is input to the computer, comprising:
(a) a storage for storing a plurality of transliteration rules;
(b) a display for displaying text as it is entered into a computer; and (c) means for selecting a transliteration rule from the plurality of transliteration rules and applying the rule to the text and displaying the resultant text on thedisplay.
(a) a storage for storing a plurality of transliteration rules;
(b) a display for displaying text as it is entered into a computer; and (c) means for selecting a transliteration rule from the plurality of transliteration rules and applying the rule to the text and displaying the resultant text on thedisplay.
2. The system as recited in claim 1, including processing means for initiating the text processing operation through an iconic operation.
3. The system as recited in claim 2, including processing means for initiating the text operation by double-clicking on an icon.
4. The system as recited in claim 2, including processing means for drop-launching the text operation.
5. The system as recited in claim 1, including:
(a) a storage for storing at least one preceding context rule;
(b) a storage for storing a source; and (c) a storage for storing at least one succeeding context rule.
(a) a storage for storing at least one preceding context rule;
(b) a storage for storing a source; and (c) a storage for storing at least one succeeding context rule.
6. The system as recited in claim 5, including a processor for applying the rules to text and marking appropriate text for rechecking.
7. The system as recited in claim 5, including means for applying inclusive and exclusive variables.
8. The system as recited in claim 5, including hashing means for increasing the performance of rule lookup in the storage.
9. The system as recited in claim 4, including means for calculating an optimal number of characters to process before applying the means for transliterating.
10. The system as recited in claim 1, including a display for composing, modifying and storing rules and variables for use in the transliteration operation.
11. The system as recited in claim 4, including means for transcribing from one script to another during input operations.
12. The system as recited in claim 1, including means for converting multiple characters into a single character representation of the multiple characters, storing the single character representation in a storage and displaying the single character representation on a display.
13. The system as recited in claim 1, including an object for transliterating forward and backward.
14. A method for transliterating text as it is input to a computer, comprising the steps of:
(a) storing a plurality of transliteration rules;
(b) displaying text as it is entered into the computer; and (c) selecting a transliteration rule from the plurality of transliteration rules and applying rule to the text and displaying the resultant text on the display.
(a) storing a plurality of transliteration rules;
(b) displaying text as it is entered into the computer; and (c) selecting a transliteration rule from the plurality of transliteration rules and applying rule to the text and displaying the resultant text on the display.
15. The method as recited in claim 14, including the step of initiating the text processing operation through an iconic operation.
16. The method as recited in claim 15, including the step of initiating the text operation by double-clicking on an icon.
17. The method as recited in claim 15, including the step of drop-launching the text operation.
18. The method as recited in claim 14, including the steps of:
(a) storing at least one preceding context rule;
(b) storing a source; and (c) storing at least one succeeding context rule.
(a) storing at least one preceding context rule;
(b) storing a source; and (c) storing at least one succeeding context rule.
19. The method as recited in claim 18, including the step of applying the rules to text and marking appropriate text for rechecking.
20. The method as recited in claim 17, including the step of applying inclusive and exclusive variables.
21. The method as recited in claim 17, including the step of increasing the performance of rule lookup in the storage.
22. The method as recited in claim 16, including the step of calculating an optimal number of characters to process before applying the means for transliterating.
23. The method as recited in claim 14, including the step of composing, modifying and storing rules and variables for use in the transliteration operation.
24. The method as recited in claim 16, including the step of transcribing from one script to another during input operations.
25. The method as recited in claim 14, including the step of converting multiplecharacters into a single character representation of the multiple characters, storing the single character representation in a storage and displaying the single character representation on a display.
26. The method as recited in claim 14, including the step of transliterating forward and backward.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US08/053,790 US5432948A (en) | 1993-04-26 | 1993-04-26 | Object-oriented rule-based text input transliteration system |
US053,790 | 1993-04-26 |
Publications (1)
Publication Number | Publication Date |
---|---|
CA2145668A1 true CA2145668A1 (en) | 1994-11-10 |
Family
ID=21986560
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CA002145668A Abandoned CA2145668A1 (en) | 1993-04-26 | 1994-01-03 | Text input transliteration system |
Country Status (7)
Country | Link |
---|---|
US (1) | US5432948A (en) |
EP (1) | EP0686286B1 (en) |
JP (1) | JPH08509829A (en) |
AU (1) | AU6019594A (en) |
CA (1) | CA2145668A1 (en) |
DE (1) | DE69400869T2 (en) |
WO (1) | WO1994025922A1 (en) |
Families Citing this family (86)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
AU6018694A (en) * | 1993-04-26 | 1994-11-21 | Taligent, Inc. | Text transliteration system |
DE69430421T2 (en) * | 1994-01-14 | 2003-03-06 | Sun Microsystems Inc | Method and device for automating the environment adaptation of computer programs |
US5873111A (en) * | 1996-05-10 | 1999-02-16 | Apple Computer, Inc. | Method and system for collation in a processing system of a variety of distinct sets of information |
US5999972A (en) | 1996-07-01 | 1999-12-07 | Sun Microsystems, Inc. | System, method and article of manufacture for a distributed computer system framework |
US6434598B1 (en) | 1996-07-01 | 2002-08-13 | Sun Microsystems, Inc. | Object-oriented system, method and article of manufacture for a client-server graphical user interface (#9) framework in an interprise computing framework system |
US5987245A (en) | 1996-07-01 | 1999-11-16 | Sun Microsystems, Inc. | Object-oriented system, method and article of manufacture (#12) for a client-server state machine framework |
US6266709B1 (en) | 1996-07-01 | 2001-07-24 | Sun Microsystems, Inc. | Object-oriented system, method and article of manufacture for a client-server failure reporting process |
US6424991B1 (en) | 1996-07-01 | 2002-07-23 | Sun Microsystems, Inc. | Object-oriented system, method and article of manufacture for a client-server communication framework |
US6272555B1 (en) | 1996-07-01 | 2001-08-07 | Sun Microsystems, Inc. | Object-oriented system, method and article of manufacture for a client-server-centric interprise computing framework system |
US5848246A (en) | 1996-07-01 | 1998-12-08 | Sun Microsystems, Inc. | Object-oriented system, method and article of manufacture for a client-server session manager in an interprise computing framework system |
US6304893B1 (en) | 1996-07-01 | 2001-10-16 | Sun Microsystems, Inc. | Object-oriented system, method and article of manufacture for a client-server event driven message framework in an interprise computing framework system |
US6038590A (en) | 1996-07-01 | 2000-03-14 | Sun Microsystems, Inc. | Object-oriented system, method and article of manufacture for a client-server state machine in an interprise computing framework system |
US5802533A (en) * | 1996-08-07 | 1998-09-01 | Walker; Randall C. | Text processor |
US6085162A (en) * | 1996-10-18 | 2000-07-04 | Gedanken Corporation | Translation system and method in which words are translated by a specialized dictionary and then a general dictionary |
US6745381B1 (en) | 1997-12-12 | 2004-06-01 | International Business Machines Coroporation | Method and apparatus for annotating static object models with business rules |
US6016477A (en) * | 1997-12-18 | 2000-01-18 | International Business Machines Corporation | Method and apparatus for identifying applicable business rules |
US6963871B1 (en) * | 1998-03-25 | 2005-11-08 | Language Analysis Systems, Inc. | System and method for adaptive multi-cultural searching and matching of personal names |
US8812300B2 (en) | 1998-03-25 | 2014-08-19 | International Business Machines Corporation | Identifying related names |
US8855998B2 (en) | 1998-03-25 | 2014-10-07 | International Business Machines Corporation | Parsing culturally diverse names |
US6170000B1 (en) * | 1998-08-26 | 2001-01-02 | Nokia Mobile Phones Ltd. | User interface, and associated method, permitting entry of Hangul sound symbols |
US6389386B1 (en) | 1998-12-15 | 2002-05-14 | International Business Machines Corporation | Method, system and computer program product for sorting text strings |
US6460015B1 (en) * | 1998-12-15 | 2002-10-01 | International Business Machines Corporation | Method, system and computer program product for automatic character transliteration in a text string object |
US6496844B1 (en) | 1998-12-15 | 2002-12-17 | International Business Machines Corporation | Method, system and computer program product for providing a user interface with alternative display language choices |
US7099876B1 (en) | 1998-12-15 | 2006-08-29 | International Business Machines Corporation | Method, system and computer program product for storing transliteration and/or phonetic spelling information in a text string class |
US6363342B2 (en) * | 1998-12-18 | 2002-03-26 | Matsushita Electric Industrial Co., Ltd. | System for developing word-pronunciation pairs |
US6493694B1 (en) * | 1999-04-01 | 2002-12-10 | Qwest Communications Interational Inc. | Method and system for correcting customer service orders |
JP2001125915A (en) * | 1999-10-28 | 2001-05-11 | Fujitsu Ltd | Information retrieving device |
JP4291532B2 (en) * | 1999-11-17 | 2009-07-08 | 国際連合 | Language conversion system |
IL134328A0 (en) * | 2000-02-02 | 2001-04-30 | Cellcom Israel Ltd | Cellular telecommunication network for transmitting transliterated text messages and method therefor |
TW561360B (en) * | 2000-08-22 | 2003-11-11 | Ibm | Method and system for case conversion |
US6692170B2 (en) | 2001-02-21 | 2004-02-17 | Eli Abir | Method and apparatus for text input |
WO2002097663A1 (en) * | 2001-05-31 | 2002-12-05 | University Of Southern California | Integer programming decoder for machine translation |
US8214196B2 (en) | 2001-07-03 | 2012-07-03 | University Of Southern California | Syntax-based statistical translation model |
US7136803B2 (en) * | 2001-09-25 | 2006-11-14 | Apple Computer, Inc. | Japanese virtual dictionary |
CA2475857C (en) * | 2002-03-11 | 2008-12-23 | University Of Southern California | Named entity translation |
US20040002850A1 (en) * | 2002-03-14 | 2004-01-01 | Shaefer Leonard Arthur | System and method for formulating reasonable spelling variations of a proper name |
WO2004001623A2 (en) | 2002-03-26 | 2003-12-31 | University Of Southern California | Constructing a translation lexicon from comparable, non-parallel corpora |
WO2004100016A1 (en) * | 2003-05-06 | 2004-11-18 | America Online Incorporated | Non-dictionary based japanese language tokenizer |
US8548794B2 (en) | 2003-07-02 | 2013-10-01 | University Of Southern California | Statistical noun phrase translation |
US7711545B2 (en) * | 2003-07-02 | 2010-05-04 | Language Weaver, Inc. | Empirical methods for splitting compound words with application to machine translation |
US7369986B2 (en) * | 2003-08-21 | 2008-05-06 | International Business Machines Corporation | Method, apparatus, and program for transliteration of documents in various Indian languages |
US8200475B2 (en) * | 2004-02-13 | 2012-06-12 | Microsoft Corporation | Phonetic-based text input method |
WO2005089340A2 (en) * | 2004-03-15 | 2005-09-29 | University Of Southern California | Training tree transducers |
US8296127B2 (en) * | 2004-03-23 | 2012-10-23 | University Of Southern California | Discovery of parallel text portions in comparable collections of corpora and training using comparable texts |
US20070005586A1 (en) * | 2004-03-30 | 2007-01-04 | Shaefer Leonard A Jr | Parsing culturally diverse names |
US8666725B2 (en) * | 2004-04-16 | 2014-03-04 | University Of Southern California | Selection and use of nonstatistical translation components in a statistical machine translation framework |
US8600728B2 (en) | 2004-10-12 | 2013-12-03 | University Of Southern California | Training for a text-to-text application which uses string to tree conversion for training and decoding |
US7376648B2 (en) * | 2004-10-20 | 2008-05-20 | Oracle International Corporation | Computer-implemented methods and systems for entering and searching for non-Roman-alphabet characters and related search systems |
US8886517B2 (en) | 2005-06-17 | 2014-11-11 | Language Weaver, Inc. | Trust scoring for language translation systems |
US8676563B2 (en) | 2009-10-01 | 2014-03-18 | Language Weaver, Inc. | Providing human-generated and machine-generated trusted translations |
US7974833B2 (en) | 2005-06-21 | 2011-07-05 | Language Weaver, Inc. | Weighted system of expressing language information using a compact notation |
US7389222B1 (en) | 2005-08-02 | 2008-06-17 | Language Weaver, Inc. | Task parallelization in a text-to-text system |
US7813918B2 (en) * | 2005-08-03 | 2010-10-12 | Language Weaver, Inc. | Identifying documents which form translated pairs, within a document collection |
US7624020B2 (en) * | 2005-09-09 | 2009-11-24 | Language Weaver, Inc. | Adapter for allowing both online and offline training of a text to text system |
CN100483399C (en) * | 2005-10-09 | 2009-04-29 | 株式会社东芝 | Training transliteration model, segmentation statistic model and automatic transliterating method and device |
US10319252B2 (en) | 2005-11-09 | 2019-06-11 | Sdl Inc. | Language capability assessment and training apparatus and techniques |
US8943080B2 (en) | 2006-04-07 | 2015-01-27 | University Of Southern California | Systems and methods for identifying parallel documents and sentence fragments in multilingual document collections |
US8077974B2 (en) | 2006-07-28 | 2011-12-13 | Hewlett-Packard Development Company, L.P. | Compact stylus-based input technique for indic scripts |
US8886518B1 (en) | 2006-08-07 | 2014-11-11 | Language Weaver, Inc. | System and method for capitalizing machine translated text |
US8433556B2 (en) | 2006-11-02 | 2013-04-30 | University Of Southern California | Semi-supervised training for statistical word alignment |
US9122674B1 (en) | 2006-12-15 | 2015-09-01 | Language Weaver, Inc. | Use of annotations in statistical machine translation |
US8468149B1 (en) | 2007-01-26 | 2013-06-18 | Language Weaver, Inc. | Multi-lingual online community |
US8615389B1 (en) | 2007-03-16 | 2013-12-24 | Language Weaver, Inc. | Generation and exploitation of an approximate language model |
US8831928B2 (en) | 2007-04-04 | 2014-09-09 | Language Weaver, Inc. | Customizable machine translation service |
CN105204617B (en) * | 2007-04-11 | 2018-12-14 | 谷歌有限责任公司 | The method and system integrated for Input Method Editor |
US8005664B2 (en) * | 2007-04-30 | 2011-08-23 | Tachyon Technologies Pvt. Ltd. | System, method to generate transliteration and method for generating decision tree to obtain transliteration |
EG25474A (en) * | 2007-05-21 | 2012-01-11 | Sherikat Link Letatweer Elbarmaguey At Sae | Method for translitering and suggesting arabic replacement for a given user input |
US8825466B1 (en) | 2007-06-08 | 2014-09-02 | Language Weaver, Inc. | Modification of annotated bilingual segment pairs in syntax-based machine translation |
WO2009049049A1 (en) * | 2007-10-09 | 2009-04-16 | Language Analytics Llc | Method and system for adaptive transliteration |
US8463597B2 (en) | 2008-05-11 | 2013-06-11 | Research In Motion Limited | Mobile electronic device and associated method enabling identification of previously entered data for transliteration of an input |
JP2010055235A (en) * | 2008-08-27 | 2010-03-11 | Fujitsu Ltd | Translation support program and system thereof |
US8990064B2 (en) | 2009-07-28 | 2015-03-24 | Language Weaver, Inc. | Translating documents based on content |
US8380486B2 (en) | 2009-10-01 | 2013-02-19 | Language Weaver, Inc. | Providing machine-generated translations and corresponding trust levels |
US10417646B2 (en) | 2010-03-09 | 2019-09-17 | Sdl Inc. | Predicting the cost associated with translating textual content |
US11003838B2 (en) | 2011-04-18 | 2021-05-11 | Sdl Inc. | Systems and methods for monitoring post translation editing |
US8694303B2 (en) | 2011-06-15 | 2014-04-08 | Language Weaver, Inc. | Systems and methods for tuning parameters in statistical machine translation |
US8886515B2 (en) | 2011-10-19 | 2014-11-11 | Language Weaver, Inc. | Systems and methods for enhancing machine translation post edit review processes |
US9158762B2 (en) | 2012-02-16 | 2015-10-13 | Flying Lizard Languages, Llc | Deconstruction and construction of words of a polysynthetic language for translation purposes |
US8942973B2 (en) | 2012-03-09 | 2015-01-27 | Language Weaver, Inc. | Content page URL translation |
US10261994B2 (en) | 2012-05-25 | 2019-04-16 | Sdl Inc. | Method and system for automatic management of reputation of translators |
US9152622B2 (en) | 2012-11-26 | 2015-10-06 | Language Weaver, Inc. | Personalized machine translation via online adaptation |
US9218341B2 (en) * | 2013-08-26 | 2015-12-22 | Lingua Next Technologies Pvt. Ltd. | Method and system for language translation |
KR101995741B1 (en) * | 2013-10-04 | 2019-07-03 | 오에스랩스 피티이 리미티드 | A gesture based system for translation and transliteration of input text and a method thereof |
US9213694B2 (en) | 2013-10-10 | 2015-12-15 | Language Weaver, Inc. | Efficient online domain adaptation |
US20180232598A1 (en) * | 2017-02-10 | 2018-08-16 | Microsoft Technology Licensing, Llc | Recursive object oriented pattern matching |
US11455476B2 (en) * | 2017-04-05 | 2022-09-27 | TSTREET Pty Ltd | Language translation aid |
Family Cites Families (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CA1184305A (en) * | 1980-12-08 | 1985-03-19 | Russell J. Campbell | Error correcting code decoder |
GB2125197B (en) * | 1982-07-22 | 1987-03-04 | Pingyi Zhi | Encoding chinese characters |
US5091950A (en) * | 1985-03-18 | 1992-02-25 | Ahmed Moustafa E | Arabic language translating device with pronunciation capability using language pronunciation rules |
US4951202A (en) * | 1986-05-19 | 1990-08-21 | Yan Miin J | Oriental language processing system |
US4821220A (en) * | 1986-07-25 | 1989-04-11 | Tektronix, Inc. | System for animating program operation and displaying time-based relationships |
US4885717A (en) * | 1986-09-25 | 1989-12-05 | Tektronix, Inc. | System for graphically representing operation of object-oriented programs |
US4891630A (en) * | 1988-04-22 | 1990-01-02 | Friedman Mark B | Computer vision system with improved object orientation technique |
EP0347162A3 (en) * | 1988-06-14 | 1990-09-12 | Tektronix, Inc. | Apparatus and methods for controlling data flow processes by generated instruction sequences |
US5041992A (en) * | 1988-10-24 | 1991-08-20 | University Of Pittsburgh | Interactive method of developing software interfaces |
US5133075A (en) * | 1988-12-19 | 1992-07-21 | Hewlett-Packard Company | Method of monitoring changes in attribute values of object in an object-oriented database |
US5050090A (en) * | 1989-03-30 | 1991-09-17 | R. J. Reynolds Tobacco Company | Object placement method and apparatus |
US5113342A (en) * | 1989-04-26 | 1992-05-12 | International Business Machines Corporation | Computer method for executing transformation rules |
US5060276A (en) * | 1989-05-31 | 1991-10-22 | At&T Bell Laboratories | Technique for object orientation detection using a feed-forward neural network |
US5125091A (en) * | 1989-06-08 | 1992-06-23 | Hazox Corporation | Object oriented control of real-time processing |
US5276616A (en) * | 1989-10-16 | 1994-01-04 | Sharp Kabushiki Kaisha | Apparatus for automatically generating index |
US5181162A (en) * | 1989-12-06 | 1993-01-19 | Eastman Kodak Company | Document management and production system |
US5093914A (en) * | 1989-12-15 | 1992-03-03 | At&T Bell Laboratories | Method of controlling the execution of object-oriented programs |
US5075848A (en) * | 1989-12-22 | 1991-12-24 | Intel Corporation | Object lifetime control in an object-oriented memory protection mechanism |
US5329446A (en) * | 1990-01-19 | 1994-07-12 | Sharp Kabushiki Kaisha | Translation machine |
US5151987A (en) * | 1990-10-23 | 1992-09-29 | International Business Machines Corporation | Recovery objects in an object oriented computing environment |
US5224039A (en) * | 1990-12-28 | 1993-06-29 | International Business Machines Corporation | Method of enabling the translation of machine readable information |
US5224040A (en) * | 1991-03-12 | 1993-06-29 | Tou Julius T | Method for translating chinese sentences |
US5119475A (en) * | 1991-03-13 | 1992-06-02 | Schlumberger Technology Corporation | Object-oriented framework for menu definition |
-
1993
- 1993-04-26 US US08/053,790 patent/US5432948A/en not_active Expired - Lifetime
-
1994
- 1994-01-03 WO PCT/US1994/000081 patent/WO1994025922A1/en active IP Right Grant
- 1994-01-03 AU AU60195/94A patent/AU6019594A/en not_active Abandoned
- 1994-01-03 EP EP94906507A patent/EP0686286B1/en not_active Expired - Lifetime
- 1994-01-03 JP JP6524219A patent/JPH08509829A/en not_active Ceased
- 1994-01-03 CA CA002145668A patent/CA2145668A1/en not_active Abandoned
- 1994-01-03 DE DE69400869T patent/DE69400869T2/en not_active Expired - Fee Related
Also Published As
Publication number | Publication date |
---|---|
EP0686286B1 (en) | 1996-11-06 |
DE69400869T2 (en) | 1997-05-15 |
AU6019594A (en) | 1994-11-21 |
DE69400869D1 (en) | 1996-12-12 |
WO1994025922A1 (en) | 1994-11-10 |
JPH08509829A (en) | 1996-10-15 |
EP0686286A1 (en) | 1995-12-13 |
US5432948A (en) | 1995-07-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US5432948A (en) | Object-oriented rule-based text input transliteration system | |
US5640587A (en) | Object-oriented rule-based text transliteration system | |
US6246976B1 (en) | Apparatus, method and storage medium for identifying a combination of a language and its character code system | |
EP0370774B1 (en) | Machine translation system | |
US5418718A (en) | Method for providing linguistic functions of English text in a mixed document of single-byte characters and double-byte characters | |
JPS6091450A (en) | Table type language interpreter | |
US5802482A (en) | System and method for processing graphic language characters | |
Saharia et al. | LuitPad: a fully unicode compatible Assamese writing software | |
JP2943791B2 (en) | Language identification device, language identification method, and recording medium recording language identification program | |
JP3483585B2 (en) | Document search device and document search method | |
Simons et al. | Multilingual data processing in the CELLAR environment | |
JPH0232467A (en) | Machine translation system | |
JP2023169063A (en) | KEARM learning conversion of Japanese input system | |
JPH11203281A (en) | Electronic dictionary retrieving device and medium stored with control program for the device | |
KR20010003037A (en) | Multilingual Input Device | |
JPH09146937A (en) | Device and method for character string conversion | |
JPH09274615A (en) | Style converting device | |
JPH04230532A (en) | Method for controlling transaltion from source language of information on display screen to object language | |
JPS62271061A (en) | Mechanical translation system | |
JPH0345423B2 (en) | ||
RIGHTS et al. | Copyright Information | |
JP2001005814A (en) | Device and method for preparing dictionary, computer readable recording medium recording dictionary preparation program and translation device | |
Raghavendra et al. | Transliteration editors for Arabic, Persian and Urdu | |
JPH0520355A (en) | System for discriminating correspondence | |
JPH0769908B2 (en) | Document processor |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
FZDE | Discontinued |