WO2009002141A1 - A system amd method of language translation - Google Patents

A system amd method of language translation Download PDF

Info

Publication number
WO2009002141A1
WO2009002141A1 PCT/MY2008/000061 MY2008000061W WO2009002141A1 WO 2009002141 A1 WO2009002141 A1 WO 2009002141A1 MY 2008000061 W MY2008000061 W MY 2008000061W WO 2009002141 A1 WO2009002141 A1 WO 2009002141A1
Authority
WO
WIPO (PCT)
Prior art keywords
word
phrase
translation
sentence
technique
Prior art date
Application number
PCT/MY2008/000061
Other languages
French (fr)
Inventor
Abdul Aziz Normaziah
Bin Ab. Rahman Suhaimi
Original Assignee
Mimos Berhad
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Mimos Berhad filed Critical Mimos Berhad
Publication of WO2009002141A1 publication Critical patent/WO2009002141A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/42Data-driven translation
    • G06F40/47Machine-assisted translation, e.g. using translation memory

Definitions

  • the present invention relates to a system and method of language translation.
  • Mote particularly the present invention relates to a system and method of language translation by utilizing a translation memory method using phrases look up approach and word alignment information database.
  • the present invention it is an objective of the present invention to introduce a new method of translation wherein a good translation suggestion is only provided when there are closely matched examples. Further to this, the present invention adopts the longest available sub-sequence to cover as much as possible the source of sentences.
  • the present invention is also deigned to develop a user interface for a translator to perform interactive translation process.
  • the present invention relates to a system to translate a language automatically wherein a hardware such as a computer is used in the system and wherein the system comprises of sentence level technique and a phrase look - up matching technique.
  • the system searchers a translation memory to find out whether there are closely matched examples and sentence level matching technique. When a record is found, the translation technique would return the translation suggestion for the input sentence and when no more records are matched then a phrase look up matching technique is applied.
  • a bi-section technique would analyze an input sentence by splitting it into two different parts. A first part is from the first word until a splitting point and the second part is to hold the rest of the word/sentence. The longest chunk for trie input sentence is searched starting movement from the left portion of the word to the right portion of the word. After obtaining an output from the first part of the original phrase, the second part will act as the original phrase and it will be split further.
  • the system would be repeated and the result for all these sub phrases would be aggregated for the basic output.
  • a repetition avoidance technique is applied thereafter wherein this technique would accumulate and identify the equivalent meaning or repeated words for a certain word.
  • Said technique of accumulating and identifying of the similar meaning word is based of the equivalent search for source and target word retrieved from word alignment information database.
  • an inter-phrase word to word distance summation technique is used therein in which a summation value is calculate to give an estimated value on which word is repeated and needs to be omitted and which one is not.
  • a method of language translation automatically wherein the method includes a hardware wherein when translating a new sentence, the system first searchers the example from word phrase database or translation memory database. The sentence mat is input into the system is compared to the source language of the example from the phrase and language database. If there is an exact matched example, then the target part of this example is suggested as the translation and wherein before the input sentence is processed, a module to filter each word in the sentence is invoked and any punctuation, symbol or word space will be extracted and all words would thereafter merged. By using normalization, two sentences with the same words are matched even though said similar words are in different order. When no exact match is obtained the method will search examples from the next database, translation memory database. The phrase lookup matching is used to find suggested meaning for a phrase by parsing the phrase into sub phrases and finding a meaning to these sub phrases and combining the results to get a final output.
  • the method further includes an input occurrence detection and a verification process that is used by the phrase look-up algorithm in order to commence the translation process, if there is no meaning or the word is not in the input, the system would just skip to the next word or phrase.
  • the system and method is conducted via hardware such as a computer or any other device in similar capacity and wherein the system and method could work locally or remotely or over a network or via the internet.
  • Figure 1 shows abi-section algorithm flow process.
  • Figure 2 shows a flow diagram of a SAT and WAT used by the bi-section algorithm as shown in Figure 1.
  • Figure 3 shows a flow chart of an example of a basic output with a repetition of word.
  • Figure 4 shows a flow chart of getting a final output using an inter phrase word to word distance summation algorithm.
  • Figure 5 shows a detail flow chart of the method and system of language translation according to the present invention.
  • Figure 6 shows a diagram of an example of word filtering formatting output.
  • Figure 7 shows a chart for translation suggestion word phrases using translation memory database.
  • Figure 8 shows a chart for translation suggestion word phrases using phrase database.
  • Figure 9 shows a chart showing an example of the phrase look up matching technique.
  • Figure 10 shows a graph showing the relation between the index and the inter-phrase distance summation.
  • the present invention employs mainly two techniques: (1) sentence level and (2) phrase look - up matching.
  • the present invention searchers the translation memory to find out whether there are closely matched examples. Ih said method, the present invention uses sentence level matching technique.
  • the translation method would return the translation suggestion for the input sentence. If no more records are matched, then the present invention would perform a phrase look up matching method.
  • This method would segment the input sentence into several sub-sequences using a bi-section algorithm.
  • the bi-section technique method would analyze an input sentence by splitting it into two different parts. The first part is from the first word until a splitting point and the second part is to hold the rest of the word/sentence.
  • This method is to search for the longest chunk for the input sentence starting movement from the left portion of the word to the right portion of the word. After obtaining an output from the first part of the original phrase, the second part will act as the original phrase and it will be split further. According to the present invention, this process would be repeated as in the previous process and the result for all these sub phrases would be aggregated for the basic output.
  • Figure 1 wherein is shown the details of the bi- section algorithm process.
  • Figure 2 wherein is shown two types of database such as sentence alignment table (SAT) and word alignment table (WAT) that stored all the information required by the present invention.
  • SAT sentence alignment table
  • WAT word alignment table
  • the repetition avoidance technique is applied thereafter wherein this method would accumulate and identify the equivalent meaning or repeated Malay words for a certain English word.
  • the method of accumulating and identifying of the similar meaning word is based of the equivalent search for source and target word retrieved from word alignment information database. These repeated words are generated because of the way the source is dealt with the source (English) and target (Malay) pairs of sentences. Basically, the structure of Malay sentence generated from this process is acceptable but it may contain repeated word that need to be omitted.
  • Figure 3 shows an example how a repeated word "kantu" occurred twice in the basic output.
  • LOCJ location of value for SAT' s target sentence
  • Figure 5 wherein the details of the method according to the present invention are shown therein.
  • the system When translating a new sentence, the system first searchers the example from word phrase database or translation memory database. The sentence that is input into the system is compared to the source language of the example from the phrase and language database.
  • the first step according to the present invention is to select a word from word phrase database and to lookup the meaning of the input sentence. Jf there is an exact matched example, the target part of this example is suggested as the translation. Otherwise, the present invention will search examples from the next database, translation memory database. Figures 7 and 8 shows the result of translation being retrieving from translation memory and word phrase database.
  • phrase look-up matching is used to find suggested meaning for a phrase by parsing the phrase into sub phrases and finding a meaning to these sub phrases and combining the results to get a final output.
  • the algorithm is based on bisection concept as described earlier.
  • Figure 9 shows an example of the phrase lookup matching process.
  • Input occurrence detection and verification is a two stages process that is used by the phrase look-up algorithm in order to commence the translation process. It doesn't build the output, but it will give the green light to the basic output construction process and will get the index value of the matching Sentence Alignment Table (SAT) entry.
  • the first stage is detection followed by the second stage, which is verification.
  • Input occurrence detection is the process in which the SAT table is searched to find an entry that contains word sequence that exactly matches the user input. The entry might contain more words than the input, and the input doesn't need to be in the beginning. For instance if the input is "big industry” and the SAT table has one entry "gaming has become a big industry recently", that entry would be considered a match.
  • Verification is another process that follows detection.
  • the Word Alignment Table (WAT) is checked for the single words that form up the SAT entry. It is important to find WAT entry sequence (which can be one or few number of words) that will exactly matched the user's input. By this process the present invention would be able to verify that we can find a meaning for the input. For example if the input is “I like to” and we have detected a SAT entry “I like to see the sky”, the corresponding WAT entries is “I”, “like”, “to see” and “the sky”, the verification will return false. This means that it cannot match exactly the input using the WAT entries, because if the first three WAT entries are combined we will get "I like to see” which is not the same as “I like to”.
  • WAT Word Alignment Table
  • the third WAT entry must be "to” instead of "to see” in order to verify the occurrence of the input.
  • the SAT entry will be dropped and the system will start all over again with the detection process. This process will keep on repeating until we successfully detect and verify an entry.
  • the entries are stored in an array for further processing. After the verification is completed, the output construction will begin.
  • the basic output construction is the primary stage in generating the output.
  • This output is an intermediate output that will be modified by the repetition avoidance stage to get the final output.
  • Il is correct from the term of structure, but might have some words repeated many times.
  • the repetition avoidance algorithm is implemented to solve this problem.
  • the output will be constructed based on the SAT and WAT entries obtained from the above procedure. It is important to highlight the concept of using source-target pairs through out the rest of the algorithm. These pairs represent a source word and its target meaning as mentioned in the WAT.
  • the present invention had an array of source-target pairs obtained from verification of input occurrence above, which follows the source language sentence structure. Adding the target word for each relevant source word in the source language order will probably not make sense, e.g. in English "red car", using the order from the source will give us “tnerah kereta”; when the correct translation in Malay language is "kereta merah", To solve this problem, the target Malay entry in SAT is selected; it has a correct structure because it's actually a Malay sentence. Word by word is taken starting at the beginning and at the same time the system would look for the target word in the source ordered array that matches the first SAT word and add it as the first entry in the Malay ordered pair array. The English ordered pair would be deleted, so that said pair is not chosen in the future. Next, the second word in SAT entry is selected and the same procedure is repeated for the rest of the words until a target ordered pairs array is achieved.
  • the basic output has no problems in structure, but it may contain repeated words. These repeated words are generated because of the way we deai with the source target pairs. Let us consider this example:
  • the target word “anda” for the source "you” is repeated 4 times in the output although it is mentioned only once in the input.
  • the SAT has the word “anda” 4 times, and in each time the input agrees to include it in the output. We can easily take the first one, but what will happen if the input was "the kind of plants you can grow most successfully'? We need the second "anda” in the output. We resolve this problem using a mathematical model that we named as the inter- phrase word-to-word distance summation.
  • This algorithm uses mathematical calculation to calculate a summation value that will give a clue on which word is repeated and needs to be omitted and which one is not as we have already mentioned earlier in this description.
  • Table 1 shows the value of loc, for each of the entry in the basic output.
  • the thick black dotted line represents the margin between acceptance and non-acceptance words, i.e. everything below the line is acceptance and the rest is not.
  • system and method is conducted via hardware such as a computer or any other device in similar capacity and wherein the system and method could work locally or remotely or over a network or via die internet.

Abstract

The present invention generally relates to a system to translate a language automatically wherein a hardware such as a computer is used in the system. The system comprises of sentence level and a phrase look-up matching technique and wherein the present invention searchers a translation memory to find out whether there are closely matched examples and sentence level matching technique. When a record is found, the translation technique would return the translation suggestion for the input sentence and when no more records are matched, then a phrase look up matching technique is applied and wherein further to this a bi-section technique would analyze an input sentence by splitting it into two different parts and wherein a first part is from the first word until a splitting point and the second part is to hold the rest of the word/sentence and wherein the longest chunk for the input sentence is searched starting movement from the left portion of the word to the right portion of the word and wherein after obtaining an output from the first part of the original phrase, the second part will act as the original phrase and it will be split further.

Description

A SYSTEM AMD METHOD OF LANGUAGE TRANSLATION
FIELD OF THE INVENTION
The present invention relates to a system and method of language translation.
Mote particularly the present invention relates to a system and method of language translation by utilizing a translation memory method using phrases look up approach and word alignment information database.
BACKGROUND OF THE INVENTION
Several method of language translation has been introduced in the prior art. Some such examples are from Hua et al., 2005, Simard and Langlais, 2001, Macklovitch and Russell, 2000, Hanas and Furuse, 2000. In the prior art, conventional translation methods retrieves examples matched with the input sentence at sentence level. In such approach, the method would provide a good translation suggestion only when there are closely matched examples.
Basically, in the prior art, the method could be divided into three different steps:-
(a) translation pairs are recorded (inclusive of word alignment information)
(b) retrieving examples from the translation memory by using search engines
(c) online learning mechanism
Ih another prior art, Simard and Langlais, 2001, the examples are ranked according to the length of the matched sub-sequences of words.
Therefore, it is an objective of the present invention to introduce a new method of translation wherein a good translation suggestion is only provided when there are closely matched examples. Further to this, the present invention adopts the longest available sub-sequence to cover as much as possible the source of sentences. The present invention is also deigned to develop a user interface for a translator to perform interactive translation process.
SUMMARY OF THE INVENTION
The present invention relates to a system to translate a language automatically wherein a hardware such as a computer is used in the system and wherein the system comprises of sentence level technique and a phrase look - up matching technique. The system searchers a translation memory to find out whether there are closely matched examples and sentence level matching technique. When a record is found, the translation technique would return the translation suggestion for the input sentence and when no more records are matched then a phrase look up matching technique is applied. A bi-section technique would analyze an input sentence by splitting it into two different parts. A first part is from the first word until a splitting point and the second part is to hold the rest of the word/sentence. The longest chunk for trie input sentence is searched starting movement from the left portion of the word to the right portion of the word. After obtaining an output from the first part of the original phrase, the second part will act as the original phrase and it will be split further.
According to the present invention the system would be repeated and the result for all these sub phrases would be aggregated for the basic output. A repetition avoidance technique is applied thereafter wherein this technique would accumulate and identify the equivalent meaning or repeated words for a certain word. Said technique of accumulating and identifying of the similar meaning word is based of the equivalent search for source and target word retrieved from word alignment information database. Further to this an inter-phrase word to word distance summation technique is used therein in which a summation value is calculate to give an estimated value on which word is repeated and needs to be omitted and which one is not.
A method of language translation automatically wherein the method includes a hardware wherein when translating a new sentence, the system first searchers the example from word phrase database or translation memory database. The sentence mat is input into the system is compared to the source language of the example from the phrase and language database. If there is an exact matched example, then the target part of this example is suggested as the translation and wherein before the input sentence is processed, a module to filter each word in the sentence is invoked and any punctuation, symbol or word space will be extracted and all words would thereafter merged. By using normalization, two sentences with the same words are matched even though said similar words are in different order. When no exact match is obtained the method will search examples from the next database, translation memory database. The phrase lookup matching is used to find suggested meaning for a phrase by parsing the phrase into sub phrases and finding a meaning to these sub phrases and combining the results to get a final output.
The method further includes an input occurrence detection and a verification process that is used by the phrase look-up algorithm in order to commence the translation process, if there is no meaning or the word is not in the input, the system would just skip to the next word or phrase.
According to the present invention the system and method is conducted via hardware such as a computer or any other device in similar capacity and wherein the system and method could work locally or remotely or over a network or via the internet.
BRIEF DESCRIPTION OF THE FIGURES
Figure 1 shows abi-section algorithm flow process.
Figure 2 shows a flow diagram of a SAT and WAT used by the bi-section algorithm as shown in Figure 1.
Figure 3 shows a flow chart of an example of a basic output with a repetition of word.
Figure 4 shows a flow chart of getting a final output using an inter phrase word to word distance summation algorithm. Figure 5 shows a detail flow chart of the method and system of language translation according to the present invention.
Figure 6 shows a diagram of an example of word filtering formatting output.
Figure 7 shows a chart for translation suggestion word phrases using translation memory database.
Figure 8 shows a chart for translation suggestion word phrases using phrase database.
Figure 9 shows a chart showing an example of the phrase look up matching technique.
Figure 10 shows a graph showing the relation between the index and the inter-phrase distance summation.
DETAILED DESCRIPTION OF THE PRESENT INVENTION
The present invention will now be described with reference made to the accompanied figures but not limited thereto. It should also be noted that the present invention ean be used for all type of language translation. However, for the purpose of this description, the invention would be described with reference made to a English ~
Malay translation method.
The present invention employs mainly two techniques: (1) sentence level and (2) phrase look - up matching. Firstly, the present invention searchers the translation memory to find out whether there are closely matched examples. Ih said method, the present invention uses sentence level matching technique. According to the present invention, if a record is tbund, the translation method would return the translation suggestion for the input sentence. If no more records are matched, then the present invention would perform a phrase look up matching method. This method would segment the input sentence into several sub-sequences using a bi-section algorithm. The bi-section technique method would analyze an input sentence by splitting it into two different parts. The first part is from the first word until a splitting point and the second part is to hold the rest of the word/sentence. The purpose of this method is to search for the longest chunk for the input sentence starting movement from the left portion of the word to the right portion of the word. After obtaining an output from the first part of the original phrase, the second part will act as the original phrase and it will be split further. According to the present invention, this process would be repeated as in the previous process and the result for all these sub phrases would be aggregated for the basic output. Reference is made to Figure 1 wherein is shown the details of the bi- section algorithm process. Alternatively, reference is also made to Figure 2 wherein is shown two types of database such as sentence alignment table (SAT) and word alignment table (WAT) that stored all the information required by the present invention.
The repetition avoidance technique is applied thereafter wherein this method would accumulate and identify the equivalent meaning or repeated Malay words for a certain English word. The method of accumulating and identifying of the similar meaning word is based of the equivalent search for source and target word retrieved from word alignment information database. These repeated words are generated because of the way the source is dealt with the source (English) and target (Malay) pairs of sentences. Basically, the structure of Malay sentence generated from this process is acceptable but it may contain repeated word that need to be omitted. Figure 3 shows an example how a repeated word "kantu" occurred twice in the basic output.
Next, the inter-phrase word to word distance summation technique is used therein. In this method, an algorithm which uses mathematical calculation to calculate a summation value that will give a clue on which word is repeated and needs to be omitted and which one is not. Figure 4 shows the basic process on how to obtain a final output using this method. According to this method, there are three steps involved to calculate summation word-to-word distance value:-
- determining the location of word in the basic output (locj) and SAT table (locj) - the distance between two words (locj, locj) is calculated wherein, di =- [locj - loci ]
in which, i,j = 0,1, n
Loci = location value for basic output
LOCJ = location of value for SAT' s target sentence
- Summation of all the distance word values generated from dj and shown via the equation as shown below
Figure imgf000008_0001
To further describe the present invention reference is made to Figure 5 wherein the details of the method according to the present invention are shown therein. When translating a new sentence, the system first searchers the example from word phrase database or translation memory database. The sentence that is input into the system is compared to the source language of the example from the phrase and language database.
If there is an exact matched example, then the target part of this example is suggested as the translation. However, there could be some instances that the example failed to find a related example from the databases. Ih order to eliminate such problems, three methods are applied into the sentence level matching technique.
Before the input sentence is processed, a module to filter each word in the sentence is invoked. Any punctuation, symbol or word space will be extracted and all words would thereafter merged, Figure 6. Then, by using normalization, two sentences with the same words can be matched even though said similar words are n different order. For example, "Vanilla Ice Cream" can be matched with "Ice Cream Vanilla".
This is done by using SQL commands such as LIKE together with the AND logical command. Currently, there are 2500 Malay-English phrases. In the present invention, the inventors have created another database entry o locate all these phrases. The first step according to the present invention is to select a word from word phrase database and to lookup the meaning of the input sentence. Jf there is an exact matched example, the target part of this example is suggested as the translation. Otherwise, the present invention will search examples from the next database, translation memory database. Figures 7 and 8 shows the result of translation being retrieving from translation memory and word phrase database.
Further to this, the phrase look-up matching is used to find suggested meaning for a phrase by parsing the phrase into sub phrases and finding a meaning to these sub phrases and combining the results to get a final output. The algorithm is based on bisection concept as described earlier. Figure 9 shows an example of the phrase lookup matching process.
Input occurrence detection and verification is a two stages process that is used by the phrase look-up algorithm in order to commence the translation process. It doesn't build the output, but it will give the green light to the basic output construction process and will get the index value of the matching Sentence Alignment Table (SAT) entry. The first stage is detection followed by the second stage, which is verification. Input occurrence detection is the process in which the SAT table is searched to find an entry that contains word sequence that exactly matches the user input. The entry might contain more words than the input, and the input doesn't need to be in the beginning. For instance if the input is "big industry" and the SAT table has one entry "gaming has become a big industry recently", that entry would be considered a match. Verification is another process that follows detection. In this process, the Word Alignment Table (WAT) is checked for the single words that form up the SAT entry. It is important to find WAT entry sequence (which can be one or few number of words) that will exactly matched the user's input. By this process the present invention would be able to verify that we can find a meaning for the input. For example if the input is "I like to" and we have detected a SAT entry "I like to see the sky", the corresponding WAT entries is "I", "like", "to see" and "the sky", the verification will return false. This means that it cannot match exactly the input using the WAT entries, because if the first three WAT entries are combined we will get "I like to see" which is not the same as "I like to". The third WAT entry must be "to" instead of "to see" in order to verify the occurrence of the input. The SAT entry will be dropped and the system will start all over again with the detection process. This process will keep on repeating until we successfully detect and verify an entry. The entries are stored in an array for further processing. After the verification is completed, the output construction will begin.
The basic output construction is the primary stage in generating the output. This output is an intermediate output that will be modified by the repetition avoidance stage to get the final output. Il is correct from the term of structure, but might have some words repeated many times. In the present invention the repetition avoidance algorithm is implemented to solve this problem. The output will be constructed based on the SAT and WAT entries obtained from the above procedure. It is important to highlight the concept of using source-target pairs through out the rest of the algorithm. These pairs represent a source word and its target meaning as mentioned in the WAT.
Initially the present invention had an array of source-target pairs obtained from verification of input occurrence above, which follows the source language sentence structure. Adding the target word for each relevant source word in the source language order will probably not make sense, e.g. in English "red car", using the order from the source will give us "tnerah kereta"; when the correct translation in Malay language is "kereta merah", To solve this problem, the target Malay entry in SAT is selected; it has a correct structure because it's actually a Malay sentence. Word by word is taken starting at the beginning and at the same time the system would look for the target word in the source ordered array that matches the first SAT word and add it as the first entry in the Malay ordered pair array. The English ordered pair would be deleted, so that said pair is not chosen in the future. Next, the second word in SAT entry is selected and the same procedure is repeated for the rest of the words until a target ordered pairs array is achieved.
If there is no meaning or the word is not in the input, the system would just skip to the next word or phrase. The system will keep on taking entries until it is done with the Malay ordered source-target pairs. Now, we have constructed the basic output.
The basic output has no problems in structure, but it may contain repeated words. These repeated words are generated because of the way we deai with the source target pairs. Let us consider this example:
The target word "anda" for the source "you" is repeated 4 times in the output although it is mentioned only once in the input. The SAT has the word "anda" 4 times, and in each time the input agrees to include it in the output. We can easily take the first one, but what will happen if the input was "the kind of plants you can grow most successfully'? We need the second "anda" in the output. We resolve this problem using a mathematical model that we named as the inter- phrase word-to-word distance summation.
This algorithm uses mathematical calculation to calculate a summation value that will give a clue on which word is repeated and needs to be omitted and which one is not as we have already mentioned earlier in this description.
Table 1 shows the value of loc, for each of the entry in the basic output.
The underlined "anda" is a repeated "anda". Running the repetition avoidance algorithm will first give us the ∑d for each word. Table 2 describes the details for getting £d for each word. The ∑d will be used as judgment value to omit the extra "anda". Since we have two you's in the basic output but there is only one in the input, so one of them must be omitted. We have crossed the row values belong to the conflicted words. Then we sum the rest to get the sigma-D of each word. We use the summation ∑d values in Table 3 to get the minimum "you" index. The other "you" will not be taken in the final output. Figure 10 shows the plot of the summations values versus the index. The thick black dotted line represents the margin between acceptance and non-acceptance words, i.e. everything below the line is acceptance and the rest is not. After removing the dropped words from the basic output, we will have the final output as follows: "Jika anda menμlϊh. suatu program bukan kritikal".
There are several further modifications done to the present invention to improve the application and workability of the present invention. The improvements are such as:-
(a) Implementing recursive code instead of the nested loops to speed up the process. (b) Using a simpler and linear model to implement the repetition avoidance algorithm that chooses one pivot value and calculates the distance to the pivot.
(c) Derivation of second-degree polynomial formulas to estimate the summation. By using them with single loops in the calculation of inner-sentence word - to - word distance summation instead of nested loops, we can effectively reduce the lime.
In the present invention the system and method is conducted via hardware such as a computer or any other device in similar capacity and wherein the system and method could work locally or remotely or over a network or via die internet.
Figure imgf000013_0001
Figure imgf000014_0001

Claims

1. A system to translate a language automatically characterized in that wherein a hardware such as a computer is used in the system and wherein the system comprises Of sentence level technique and a phrase look - up matching technique and wherein the system searchers a translation memory to find out whether there are closely matched examples and sentence level matching technique and wherein when a record is found, the translation technique would return the translation suggestion for the input sentence and when no more records are matched, then a phrase look up matching technique is applied and wherein further to this a bi-section technique would analyze an input sentence by splitting it into two different parts and wherein a first part is from the first word until a splitting point and the second part is to hold the rest of the word/sentence and wherein the longest chunk for the input sentence is searched starting movement from the left portion of the word to the right portion of the word and wherein after obtaining an output from the first part of the original phrase, the second part will act as the original phrase and it will be split further.
2. A system to translate a language automatically as claimed in Claim 1 wherein the system would be repeated and the result for all these sub phrases would be aggregated for the basic output.
3. A system to translate a language automatically as claimed in Claim 1 wherein a repetition avoidance technique is applied thereafter wherein this technique would accumulate and identify the equivalent meaning or repeated words for a certain word and wherein said technique of accumulating and identifying of the similar meaning word is based of the equivalent search for source and target word retrieved from word alignment information database.
4. A system to translate a language automatically as claimed in Claim 1 wherein an inter-phrase word to word distance summation technique is used therein in which a summation value is calculate to give an estimated value on which word is repeated and needs to be omitted and which one is not,
5. A method of language translation automatically characterized in that wherein the method includes a hardware wherein when translating a new sentence, the system first searchers the example from word phrase database or translation memory database and wherein the sentence that is input into the system is compared to the source language of the example from the phrase and language database and wherein if there is an exact matched example, then the target part of this example is suggested as the translation and wherein before the input sentence is processed, a module to Filter each word in the sentence is invoked and any punctuation, symbol or word space will be extracted and all words would thereafter merged and wherein by using normalization, two sentences with the same words are matched even though said similar words are in different order.
6. A method of language translation automatically as claimed in Claim 5 wherein when no exact match is obtained the method will search examples from the next database, translation memory database and wherein the phrase look-up matching is used to find suggested meaning for a phrase by parsing the phrase into sub phrases and finding a meaning to these sub phrases and combining the results to get a final output.
7. A method of language translation automatically as claimed in Claim 5 wherein the method further includes an input occurrence detection and a verification process that is used by the phrase look-up algorithm in order to commence the translation process.
8. A method of language translation automatically as claimed in Claim 5 wherein if there is no meaning or the word is not in the input, the system would just skip to the next word or phrase.
9. A system and method as claimed in any of the preceding claims wherein the system and, method is conducted via hardware such as a computer or any other device in similar capacity and wherein the system and method could work locally or remotely or over a network or via the internet.
PCT/MY2008/000061 2007-06-27 2008-06-27 A system amd method of language translation WO2009002141A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
MYPI20071016 MY151645A (en) 2007-06-27 2007-06-27 A system and method of language translation
MYPI20071016 2007-06-27

Publications (1)

Publication Number Publication Date
WO2009002141A1 true WO2009002141A1 (en) 2008-12-31

Family

ID=40185838

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/MY2008/000061 WO2009002141A1 (en) 2007-06-27 2008-06-27 A system amd method of language translation

Country Status (2)

Country Link
MY (1) MY151645A (en)
WO (1) WO2009002141A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023240839A1 (en) * 2022-06-14 2023-12-21 平安科技(深圳)有限公司 Machine translation method and apparatus, and computer device and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20010029455A1 (en) * 2000-03-31 2001-10-11 Chin Jeffrey J. Method and apparatus for providing multilingual translation over a network
WO2002071259A1 (en) * 2001-03-06 2002-09-12 Worldlingo. Inc Seamless translation system
WO2004049195A2 (en) * 2002-11-22 2004-06-10 Transclick, Inc. System and method for language translation via remote devices

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20010029455A1 (en) * 2000-03-31 2001-10-11 Chin Jeffrey J. Method and apparatus for providing multilingual translation over a network
WO2002071259A1 (en) * 2001-03-06 2002-09-12 Worldlingo. Inc Seamless translation system
WO2004049195A2 (en) * 2002-11-22 2004-06-10 Transclick, Inc. System and method for language translation via remote devices

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023240839A1 (en) * 2022-06-14 2023-12-21 平安科技(深圳)有限公司 Machine translation method and apparatus, and computer device and storage medium

Also Published As

Publication number Publication date
MY151645A (en) 2014-06-30

Similar Documents

Publication Publication Date Title
CN101878476B (en) Machine translation for query expansion
US7272558B1 (en) Speech recognition training method for audio and video file indexing on a search engine
US8332205B2 (en) Mining transliterations for out-of-vocabulary query terms
US20040254795A1 (en) Speech input search system
US20080270138A1 (en) Audio content search engine
CN102567409A (en) Method and device for providing retrieval associated word
US11573989B2 (en) Corpus specific generative query completion assistant
WO2017161899A1 (en) Text processing method, device, and computing apparatus
WO2016143449A1 (en) Entailment pair expansion device, computer program therefor, and question-answering system
US8229970B2 (en) Efficient storage and retrieval of posting lists
CN117312500B (en) Semantic retrieval model building method based on ANN and BERT
JP4969209B2 (en) Search system
US9218336B2 (en) Efficient implementation of morphology for agglutinative languages
JP5189413B2 (en) Voice data retrieval system
JP4005477B2 (en) Named entity extraction apparatus and method, and numbered entity extraction program
WO2009002141A1 (en) A system amd method of language translation
KR100745367B1 (en) Method of index and retrieval of record based on template and question answering system using as the same
JP7305077B2 (en) Information processing device, abstract output method, and abstract output program
KR20210048368A (en) System for searching similar sentence and method for searching similar sentence thereof
CN112084777B (en) Entity linking method
KR102519955B1 (en) Apparatus and method for extracting of topic keyword
JP7428035B2 (en) Data retrieval device, data retrieval method and program
FR2864281A1 (en) Phonetic units and graphic units matching method for lexical mistake correction system, involves establishing connections between last units of graphic and phonetic series to constitute path segmenting graphic series by grapheme
JPH03268064A (en) Data base retrieving system
JP2006178865A (en) Device, method and program for extracting intrinsic expression, and recording medium with the program recorded thereon

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 08778984

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 08778984

Country of ref document: EP

Kind code of ref document: A1