CN102486770A - Character conversion method and system - Google Patents

Character conversion method and system Download PDF

Info

Publication number
CN102486770A
CN102486770A CN2010105769587A CN201010576958A CN102486770A CN 102486770 A CN102486770 A CN 102486770A CN 2010105769587 A CN2010105769587 A CN 2010105769587A CN 201010576958 A CN201010576958 A CN 201010576958A CN 102486770 A CN102486770 A CN 102486770A
Authority
CN
China
Prior art keywords
words
language
target language
relevance
candidate target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2010105769587A
Other languages
Chinese (zh)
Other versions
CN102486770B (en
Inventor
杨秉哲
吴世弘
谷圳
林倩慧
卢家庆
谢文泰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute for Information Industry
Original Assignee
Institute for Information Industry
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute for Information Industry filed Critical Institute for Information Industry
Priority to CN201010576958.7A priority Critical patent/CN102486770B/en
Publication of CN102486770A publication Critical patent/CN102486770A/en
Application granted granted Critical
Publication of CN102486770B publication Critical patent/CN102486770B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention provides a character conversion method and system. The system comprises a storage unit, a classification unit and a conversion unit, wherein the storage unit is used for storing a word comparison table of a word corresponding relation between a source language and a target language; the classification unit is used for carrying out word segmentation treatment on character paragraphs of the source language to obtain a plurality of word segmentation results, and comparing the word segmentation results with the word comparison table to judge that each source language word in the character paragraphs belongs to a first or second type, wherein the source language word of the first type corresponds to a target language word and the source language word of the second type corresponds to a plurality of reserved target language words; the conversion unit is used for converting the source language word of the first type into the target language word according to the word comparison table, and selecting one word from the reserved target language words to be the target language word of the source language word of the second type according to corresponding reserved target language words and a co-occurrence relevance between the source language word of the second type and a plurality of relevant words composed of front and back words.

Description

Word conversion method and system
Technical field
The present invention relates to a kind of Word conversion method, relate in particular to a kind of Word conversion method and system that handles the corresponding a plurality of target language words of source language words.
Background technology
Along with the arriving in global village epoch, the modern often has an opportunity to contact the information from all over the world.Yet when facing the data of being write by unfamiliar language, the assistance that often must be dependent on the language conversion instrument converts these data into familiar language.
Most language conversion instrument is to convert the words that belongs to come source language into target language through the comparison table of comparisons.Yet, when the table of comparisons fails to reflect meaning of one's words drop and the term difference between the different language, very be easy to generate the transformation result of comparatively distortion.In addition, when carrying out language conversion, also often there is a source language words can be converted into the situation of a plurality of target language words.To this, have the language conversion instrument of part can require the user with the mode of manual work choose the target language words that will convert to, instrument itself can't be selected automatically.In addition, which target language words the source language words converts to also to have language conversion instrument partly can just decide in the future according to the frequency of occurrences of each target language words.But according to statistics, this kind mode is chosen wrong target language words easily, and can't produce the language conversion result of high accuracy.
Summary of the invention
In view of this, the present invention provides a kind of Word conversion method, and particularly suitable selects automatically preferable transformation result to the corresponding words of one-to-many when carrying out text conversion.
The present invention provides a kind of text conversion system, can handle the term difference between the different language, the correctness when promoting text conversion.
The present invention proposes a kind of Word conversion method; Literal paragraph in order to will meet to come source language converts target language into; Its Chinese words paragraph comprises a plurality of sources language words; The method comprises following step: a words table of comparisons is provided, and this words table of comparisons writes down the words corresponding relation of source language and target language; The literal paragraph is broken the speech processing and obtains a plurality of disconnected speech results; Compare the above-mentioned disconnected speech result and the words table of comparisons; To judge that the language words system that respectively originates belongs to first kind and second kind the two one of them; Wherein belong to the only corresponding simple target language of the source language words words of first kind, and belong to the corresponding a plurality of candidate target language words of source language words of second kind; According to the words corresponding relation that the words table of comparisons is write down, the source language words that in the literal paragraph, will belong to first kind converts corresponding target language words to; And; The source language words that will belong to second kind; According to pairing each candidate target language words and with the literal paragraph in a plurality of associated characters and words of being formed of at least one front and back words relevance appears jointly, from above-mentioned candidate target language words, choose one as the target language words that will convert to.
The present invention proposes a kind of text conversion system, converts target language in order to the literal paragraph that will meet to come source language, and its Chinese words paragraph comprises a plurality of sources language words.This system comprises: a storage element, and in order to store a words table of comparisons, the words table of comparisons writes down the words corresponding relation of source language and target language; One taxon; Couple storage element; Handling and obtain a plurality of disconnected speech results in order to speech that the literal paragraph is broken, and compare above-mentioned disconnected speech result and the words table of comparisons, is to belong to first kind and second kind the two one of them to judge the language words of respectively originating; Wherein belong to the only corresponding simple target language of the source language words words of first kind, and belong to the corresponding a plurality of candidate target language words of source language words of second kind; One converting unit; Couple storage element and taxon; In order to the words corresponding relation that is write down according to the words table of comparisons; The source language words that in the literal paragraph, will belong to first kind converts corresponding target language words to; And will belong to the source language words of second kind, according to pairing each candidate target language words and with the literal paragraph in a plurality of associated characters and words of being formed of at least one front and back words relevance appears jointly, from above-mentioned candidate target language words, choose one as the target language words that will convert to; And an output unit, couple converting unit, converted the literal paragraph of target language in order to output.
The present invention proposes a kind of Word conversion method in addition, and in order to come the text conversion of source language and target language, this method comprises: from the literal paragraph that meets to come source language, obtain a source language words; The one words table of comparisons is provided, and the words table of comparisons writes down the words corresponding relation of source language and target language, and source language words corresponds to few candidate target language words; And; According to pairing each candidate target language words and with the literal paragraph in a plurality of associated characters and words of being formed of at least one front and back words; Respectively a plurality of language datas source relevance appears jointly, from above-mentioned candidate target language words, choose one as the target language words that will convert to.
The present invention proposes a kind of text conversion system in addition, and in order to come the text conversion of source language and target language, this system comprises: an input block, from the literal paragraph that meets to come source language, obtain source language words; One storage element couples input block, and a words table of comparisons is provided, and the words table of comparisons writes down the words corresponding relation of source language and target language, and source language words corresponds to few candidate target language words; One converting unit; Couple input block and storage element; In order to according to pairing each candidate target language words and with the literal paragraph in a plurality of associated characters and words of being formed of at least one front and back words; Respectively a plurality of language datas source relevance appears jointly, from above-mentioned candidate target language words, choose one as the target language words that will convert to; And an output unit couples converting unit, has converted the literal paragraph of target language in order to output.
Based on above-mentioned; The present invention is when carrying out the conversion of literal to the literal paragraph; Situation for corresponding several candidate target language words of a source language words; Can according to pairing each candidate target language words and with the literal paragraph in a plurality of associated characters and words of being formed of at least one front and back words relevance appears jointly, from above-mentioned candidate target language words, select the target language words that is suitable for converting to most, thereby produce preferable text conversion result.
For letting the above-mentioned feature and advantage of the present invention can be more obviously understandable, hereinafter is special lifts embodiment, and cooperates appended graphic elaborating as follows.
Description of drawings
Fig. 1 is the calcspar according to the text conversion system shown in one embodiment of the invention.
Fig. 2 is the process flow diagram according to the Word conversion method shown in one embodiment of the invention.
Fig. 3 is the process flow diagram that belongs to the source language words of second kind according to the conversion shown in one embodiment of the invention.
Fig. 4 is the process flow diagram that belongs to the source language words of second kind according to the conversion shown in the another embodiment of the present invention.
Fig. 5 is the calcspar according to the text conversion system shown in the another embodiment of the present invention.
Fig. 6 is the calcspar according to the text conversion system shown in the another embodiment of the present invention.
Fig. 7 is the process flow diagram according to the Word conversion method shown in the another embodiment of the present invention.
Reference numeral:
100: the text conversion system;
110: storage element;
140: taxon;
150: converting unit;
160: output unit;
210~250: each step of the described Word conversion method of one embodiment of the invention;
310~330: the described conversion of one embodiment of the invention belongs to each step of the source language words of second kind;
410~440: the described conversion of another embodiment of the present invention belongs to each step of the source language words of second kind;
500: the text conversion system;
510: input block;
520: language model is set up the unit;
530: words table of comparisons updating block;
600: the text conversion system;
610: input block;
620: storage element;
630: converting unit;
640: output unit;
710~730: each step of the described Word conversion method of another embodiment of the present invention.
Embodiment
Fig. 1 is the calcspar according to the text conversion system shown in one embodiment of the invention.See also Fig. 1, text conversion system 100 comprises storage element 110, taxon 140, converting unit 150, and output unit 160.For instance, text conversion system 100 can be embodied in mobile phone, personal digital assistant (Personal Digital Assistant, PDA), e-book, or the mobile Internet access device (Mobile Internet Device is MID) with various computer/computing machines etc.In addition, text conversion system 100 also can embed browser, document process software, or among the website service.
Text conversion system 100 converts target language in order to the literal paragraph that will meet to come source language.For example, the literal paragraph that the literal paragraph that will belong to simplified form of Chinese Character converts Chinese-traditional into, will belong to Chinese-traditional converts simplified form of Chinese Character into, will belong to English literal paragraph converts Chinese into, and it is English or the like that the literal paragraph that maybe will belong to Chinese converts into.The present invention does not limit the kind of coming source language and target language.The literal paragraph comprises a plurality of source language words (term), and source language words can be the individual character (word) that belongs to come source language, or word/phrase (phrase) of being made up of several individual characters.
Storage element 110 for example be hard disk (Hard Disk Drive, HDD), (Solid State Drive SSD) or flash memory (flash memory) storage device, does not limit the kind of storage element 110 at this solid state hard disc.The words table of comparisons of required reference when storage element 110 is changed literal in order to store, this words table of comparisons has write down the words corresponding relation that comes source language and target language.
Taxon 140 couples storage element 110.Taxon 140 is in order to judge that according to the words table of comparisons in the storage element 110 each the source language words in the literal paragraph belongs to first kind or second kind.Wherein, belong to only corresponding single the target language words of source language words of first kind, and what deserves to be mentioned is, source language words might not equate with the number of words of pairing target language words.The source language words that belongs to second kind then can corresponding a plurality of candidate target language words.
Converting unit 150 couples storage element 110 and taxon 140.Converting unit 150 adopts different modes to convert thereof into the target language words in order to the judged result according to taxon 140 to belonging to different types of source language words, to guarantee to produce best transformation result.
In order to further specify the detailed operation mode of each unit in the text conversion system 100, below special another embodiment of act come that the present invention will be described.Fig. 2 is according to the process flow diagram of the Word conversion method shown in one embodiment of the invention, please consults Fig. 1 and Fig. 2 simultaneously.
At first in step 210, the words that is recorded in the storage element 110 table of comparisons is provided, this words table of comparisons writes down the words corresponding relation of source language and target language.At length say; Words table of comparisons record several to belong to come the words of source language (can be individual character; Or the phrase that constitutes by several individual characters), and each above-mentioned words distinguish one or more corresponding target language words (can be individual character, or the phrase that constitutes by several individual characters).What must specify is, in the words table of comparisons, belongs to come source language and target language and two corresponding words each other respectively, and its number of words might not equate.For instance; Supposing to come original language is simplified form of Chinese Character and object language is a Chinese-traditional; The words " grapefruit " that in the words table of comparisons, belongs to simplified form of Chinese Character; Its corresponding Chinese-traditional words is " grape fruit ", and belongs to the words " bus " of simplified form of Chinese Character, and its corresponding Chinese-traditional words is " bus ".
Then shown in step 220,140 pairs of literal paragraphs of taxon break the speech processing and obtain several disconnected speech results.In the present embodiment, taxon 140 for example is that the literal paragraph is carried out doubly-linked (bi-gram) or even (n-gram) disconnected speech processing of n, so that per two words of the part that comprises punctuation mark in the literal paragraph continuously and not or n word are cut into a disconnected speech result.Yet the disconnected speech that the present invention is not adopted taxon 140 is handled algorithm and is limited.
Next in step 230, taxon 140 is compared the words table of comparisons in above-mentioned disconnected speech result and the storage element 110, to judge that each the source language words in the literal paragraph is to belong to first kind or second kind.At length say, if in the words table of comparisons, can find with the literal paragraph in the words that partially or completely conforms to of a source language words, and only corresponding words that belongs to target language of this words, then this source language words of decidable belongs to first kind.In the words table of comparisons, seek with the literal paragraph in during with words that the language words of originating partially or completely conforms to, can carry out according to the principle of priority of long word.For example; Even obtain a plurality of disconnected speech results after (n-gram) disconnected speech processing according to doubly-linked or n, according to the priority of long word principle, that is earlier with the disconnected speech result than the long word speech; Come to compare respectively each disconnected speech result and words table of comparisons; Interrupt the speech result person of conforming to judge whether to have in the words table of comparisons, conform to, judge that then the disconnected speech result in the comparison is a words if having with comparison.After all disconnected speech results have all compared,, a source language words that pluralizes disassembled in the literal in the literal paragraph according to all words of from disconnected speech result, being judged out.It disassembles step; Be from the literal paragraph, to select long words earlier as source language words; The words that remaining word select goes out the vice-minister from the literal paragraph again repeats as source language words by that analogy, and remaining one word is as source language words in the literal paragraph.
Then in step 240, converting unit 150 is according to the words corresponding relation that the words table of comparisons is write down, and all source language words that in the literal paragraph, will belong to first kind convert its pairing target language words respectively to.In the time of further, converting unit 150 can convert the source language words that belongs to first kind into the target language words according to the principle of priority of long word conversion.
At last shown in step 250; Converting unit 150 will belong to the source language words of second kind; According to pairing each candidate target language words and with the literal paragraph in a plurality of associated characters and words of being formed of at least one front and back words relevance appears jointly, from pairing candidate target language words, choose one as the target language words that will convert to.The detailed operation mode of converting unit 150 will cooperate diagram to remake explanation in the back.
When converting unit 150 basis source language words belong to first kind or second kind and take different modes to come after the source language words converts corresponding target language words in the future, just can export and watch for the user by the literal paragraph that output unit 160 will be accomplished conversion.
In following embodiment; Supposing to come source language is simplified form of Chinese Character and target language is a Chinese-traditional; The employed number of words of Chinese-traditional is more because the employed number of words of simplified form of Chinese Character is less; Promptly a simplified form of Chinese Character word may correspond to a plurality of Chinese-traditional words, thereby when the literal paragraph that will belong to simplified form of Chinese Character converts Chinese-traditional into, faces the situation of the corresponding a plurality of Chinese-traditional words of a simplified form of Chinese Character words easily.For instance, suppose literal paragraph record " this blog is being write, and his lover has boiled the bowl noodle soup and eaten to him " this section content that text conversion system 100 will change at present on net annal.
At first; Handled by 140 pairs of literal paragraphs of taxon speech that breaks, the disconnected speech result who is produced is: " this name ", " name is rich ", " blog ", " visitor exists ", " at net ", " net annal ", " on the will ", " above ", " face is write ", " writing " ..., " bowl soup ", " noodle soup ", " face is given ", " giving him ", " he eats ".Taxon 140 is compared the words table of comparisons in above-mentioned disconnected speech result and the storage element 110; And judge in the middle of all included simplified form of Chinese Character words of this literal paragraph; Having only " face " this simplified form of Chinese Character words is to belong to second kind, and remaining simplified form of Chinese Character words all belongs to first kind.If words were recorded by the correspondence between words shown belong to the first category of the simplified Chinese words: "This", "name", "blog", "in", "blog", "on" "write", "forward", " he", "love", "cook", "up", "bowl", "soup", "to", "eat" correspond to the traditional Chinese words: "This" "name", "bloggers", "in", "blog", "upper", "Wei", "forward", " he", " wife", "cook", "up", "bowl" "soup", "to", "to eat."Base this, converting unit 150 can be according to above-mentioned words corresponding relation, the simplified form of Chinese Character words that will belong to first kind directly converts corresponding Chinese-traditional words into.However, due to simplified Chinese terms "surface" would correspond to two candidates Traditional Chinese word "face", "face", respectively, thus converting unit 150 may determine the candidate Traditional Chinese word "face", "face" and its text passages At least one front composed of a number of terms associated with the co-occurrence of word association, and thus from the candidate Traditional Chinese word "face", "faces" choose to convert Traditional Chinese terms.In this embodiment, the conversion unit 150 generates the conversion result is "the name of bloggers blog above Wei, his wife cooked for him to eat soup face."
In the above-described embodiments; Converting unit 150 is to change all earlier to belong to the source language words of first kind; Then for the source language words that belongs to second kind; Several associated characters and words that constituted according to pairing each candidate target language words and with front and back words in the literal paragraph relevance appears jointly, and then from all candidate target language words, choose one as the target language words that will change.
Further, below will explain that source language words that converting unit 150 will belong to second kind converts the detailed step of suitable target language words into Fig. 3.What in the present embodiment, converting unit 150 can utilize that language model calculates several associated characters and words that each candidate target language words and front and back words formed occurs relevance jointly.Wherein, language model for example is that n connects (n-gram) language model, doubly-linked (or n connects) language model, or other any vocabulary frequency tables of comparisons with contrast frequency of speech and speech.
Explanation for ease below is about to converting unit 150 to handle and belongs to the source language words for desiring to change that the source language words of second kind is claimed.See also the step 310 of Fig. 3; Several associated characters and words that at least one front and back words is formed in each candidate target language words of the source language words that converting unit 150 is utilized a language model to calculate respectively to desire to change, itself and literal paragraph relevance appears jointly.In detail; Converting unit 150 is according to the position of language words in the literal paragraph, source of desiring to change; Obtain at least one front and back words in the literal paragraph (for example prev word, back one word, the first two word, back two words ... Deng), and several associated characters and words can be formed in candidate target language words and above-mentioned front and back words.What converting unit 150 will utilize that language model calculates above-mentioned associated characters and words occurs relevance jointly.
For example, suppose the source language to Simplified Chinese, Traditional Chinese target language, the conversion unit 150 uses the language model is n even language model, and a paragraph of text, "This name bloggers blog (surface) Wei, his wife cooked bowl of soup (surface) for him to eat, "for example, which brackets Simplified Chinese" noodles "word is not recognized conversion and belongs to the second category of words in the source language, which corresponds to the target language word for the candidate Traditional Chinese "face", "faces" two words.When converting unit 150 will be converted to suitable object language words with " face " in first parantheses; Converting unit 150 defines at least one front and back words according to " face " position in the literal paragraph in first parantheses from " this Blogger is on blogger " these words.With candidate target language words " face " is example, the associated characters and words that itself and above-mentioned front and back words are formed be " top ", " above the lattice ", " falling above the lattice " ..., " the name Blogger is on blogger ", " making a Blogger on blogger ".Converting unit 150 can be found out the number of times (representing with F (face)) of this words of all appearance " face " in language model, and in language model, find out associated characters and words " above " occurrence number (with F (above) represent).What deserves to be mentioned is that if the number of times that finds is 0, being illustrated in does not have corresponding associated characters and words in the language model, base this, converting unit 150 can be set at a default value with number of times, is 0 result to prevent to calculate probability.In language model, the probability P (top) that associated characters and words " top " occur can be represented by following formula:
Figure BSA00000381468500101
Then, converting unit 150 can be found out the occurrence number (representing with F (above the lattice)) of associated characters and words " above the lattice " in language model, and calculates the probability P (above the lattice) that associated characters and words " above the lattice " in language model, occur with following formula:
Figure BSA00000381468500102
By that analogy; Converting unit 150 calculate respectively P (top), P (above the lattice) ..., P (the name Blogger is on blogger), P (this Blogger is on blogger) probability values, and with the product of above-mentioned probable value be used as candidate target language words " face " and several associated characters and words of being formed with the front and back words relevance appears jointly.
Similarly, in determining a candidate target language word "surface" of the front and rear of a number of words consisting of co-occurrence of words associated with the relevance, the conversion unit 150 also calculates P (top), P ((Version above), ..., P (name bloggers blog above), P (this name bloggers blog above) equal probability values, and the product of the probability value as a candidate target language words "surface "The co-occurrence of the corresponding association.
Then in step 320, converting unit 150 selects the highest corresponding candidate target language words that occurs relevance jointly to be used as the target language words in pairing all the candidate target language words of source language words.The continuity previous embodiment; Suppose that the pairing relevance that occurs jointly of candidate target language words " face " is higher than the pairing relevance that occurs jointly of candidate target language words " face ", converting unit 150 just can select candidate target language words " face " as the object language words.
At last shown in step 330, converting unit 150 in the literal paragraph in the future the source language words convert the target language words into.
In another embodiment, for speed up processing, converting unit 150 also can adopt the doubly-linked language model calculate each candidate target language words and with the literal paragraph in several associated characters and words of being formed of at least one front and back words relevance appears jointly.
Similarly to text passages "This name bloggers blog (surface) Wei, his wife cooked bowl of soup (surface) for him to eat," for example, which brackets Simplified Chinese "noodles" word is not converted and belong to the second category of the source language word, its corresponding candidate target language words to Traditional Chinese "face", "faces" two words.When converting unit 150 will be converted to suitable object language words with " face " in first parantheses, words before and after converting unit 150 obtains in " this Blogger is on blogger " these words.Then; Converting unit 150 can calculate respectively P (top), P (on the lattice), P (lattice fall), P (clan) ..., P (name portion), P (this name) probability values (account form of probable value is similar with previous embodiment), and be used as the pairing relevance that occurs jointly of candidate target language words " face " with the product of above-mentioned probable value.Conversion unit 150 will also calculate P (above), P (lattice), P (off grid), P (tribe), ..., P (name of department), P (the name) and other probability value, and the product of the probability value as a candidate target language words "face" corresponding to the co-occurrence of relevance.Which candidate target language words is converting unit 150 determine to select as the target language words according to these two pairing sizes that occur relevance jointly of candidate target language words.
In general, to the source language words that belongs to second type in the literal paragraph, converting unit 150 can adopt each step shown in Figure 3 from pairing several candidate target language words, to select the target language words that really will convert to.Yet in language model under the related data situation very little, possibly cause the pairing gap that occurs relevance jointly of each candidate target language words too small, even having several candidate target language words, pairing to occur relevance jointly identical.The base this, in another embodiment, converting unit 150 for example can adopt each step shown in Figure 4 to decide and how from several candidate target language words, to choose one as the target language words that will convert to.
See also Fig. 4, because step 410 is same or similar with the step 310 of Fig. 3, so repeat no more at this.
Shown in step 420, converting unit 150 is selected several higher candidate target language words that occurs relevance jointly in pairing all the candidate target language words of source language words.Wherein, the above-mentioned higher candidate target language words that occurs relevance jointly is that its pairing relevance that occurs jointly is greater than one first threshold value.For instance, first threshold value for example is any statistical values such as pairing mean value that occurs relevance jointly of all candidate target language words or preceding mark.Therefore, all corresponding identical and the highest when relevance occurring jointly as several candidate target language words, those candidate target language words can be selected as the higher candidate target language words that occurs relevance jointly.Perhaps; As several candidate target language words corresponding the candidate target language words of relevance apparently higher than other appear jointly; And these candidate target language words are corresponding when occurring relevance gap to each other little (for example less than second threshold value) jointly, then with those candidate target language words as the higher candidate target language words that occurs relevance jointly.
Then in step 430; Converting unit 150 is utilized the dictionary of supporting a target language and a reference language; Respectively each higher each word that occurs the candidate target language words of relevance jointly all is translated as a corresponding reference language word; And judge the relevance between the above-mentioned corresponding reference language word of each higher candidate target language words that occurs relevance jointly according to dictionary and each corresponding reference language word, thereby select the highest candidate target language words of relevance of corresponding reference language word to be used as the target language words.
At last shown in step 440, converting unit 150 in the literal paragraph in the future the source language words convert the target language words into.
For instance, supposing to come source language is that simplified form of Chinese Character, target language are Chinese-traditional, and reference language is English.Paragraphs of text. "But she still triumph. (Stroke) motion paddle" as an example, where brackets "Plan" word is not converted and belongs to the second category of the source language word, which corresponds to the target language word for the candidate "plan" and "plan."Conversion unit 150 according to the steps shown in Figure 4 to decide to convert text passages "but she still triumph. Paddling paddle" or "but she still triumph. Paddling paddle."
In detail; The position of source language words in the literal paragraph is that the words in n the character-spacing in front and back is obtained at the center since the converting unit 150 in the present embodiment, and the higher candidate target language words that occurs relevance jointly formed in each candidate target language words and above-mentioned words.With n equal to 3, for example, a higher correlation co-occur candidate target language words as "contented paddling paddle", "contented paddling paddle."
Conversion unit 150 support Traditional Chinese and English dictionary, the higher correlation co-occur candidate target language words "contented paddling paddle" for each word translated into the corresponding reference language words.For example, the conversion unit 150 will "draw" the word translated as "draw" and "scratch" the two corresponding reference language word, the "paddle" corresponding reference language the word translated word "oar", and so on.In addition, the conversion unit 150 support Traditional Chinese and English dictionary, the "contented paddling paddle" in the "Plan" reference language the word translated into the corresponding word "paddle", the "paddle" corresponds to the reference language the word translated the word "oar", and so on.
In one embodiment, converting unit 150 is according to each corresponding reference language word frequency of occurrences at a plurality of grammatical interpretations in dictionary, to determine the relevance between each corresponding reference language word.For example; In the dictionary of supporting Chinese-traditional and English; Corresponding reference language word " paddle " has among the grammatical interpretation that appears at corresponding reference language word " oar ", but corresponding reference language word " draw ", " scratch " all do not appear among the grammatical interpretation of corresponding reference language word " oar ".Promptly; Corresponding reference language word " paddle " is higher than corresponding reference language word " draw ", " scratch " frequency of occurrences at the grammatical interpretation of corresponding reference language word " oar " in the frequency of occurrences of the grammatical interpretation of corresponding reference language word " oar ", so converting unit 150 judges that corresponding reference language words " paddle " and relevance between the corresponding reference language word " oar " are higher than the relevance between corresponding reference language word " draw ", " scratch " and the corresponding reference language word " oar ".Base this, converting unit 150 is chosen in the literal paragraph in the future the original language words and " draws " and be converted to the object language words and " draw ", rather than the object language words " is drawn ".
Yet in another embodiment, converting unit 150 can also utilize a meaning of one's words relational tree (Semantic Tree) to calculate the meaning of one's words distance between each corresponding reference language word, to judge the relevance between each corresponding reference language word.Wherein, meaning of one's words distance more closely representes that relevance is high more.Because the meaning of one's words distance of utilizing meaning of one's words relational tree to calculate between two words is the common technology means of this area, so repeat no more at this.
Fig. 5 is the calcspar according to the text conversion system shown in the another embodiment of the present invention.As shown in Figure 5, text conversion system 500 comprises that storage element 110, taxon 140, converting unit 150, output unit 160, input block 510, language model set up unit 520, and words table of comparisons updating block 530.Because storage element 110, taxon 140, converting unit 150 have same or analogous function with output unit 160 and text conversion system 100 included corresponding units shown in Figure 1, so repeat no more at this.
In the present embodiment, input block 510 couples storage element 110, meets to come the literal paragraph of source language in order to reception.
Language model is set up unit 520 and is coupled to storage element 110.Storage element 110 stores at least one corpus, above-mentioned corpus can be existing Parallel Corpus (parallel corpus) or by text conversion system 500 through prospecting the Parallel Corpus that is produced automatically.And language model is set up unit 520 and can be set up language model by the above-mentioned corpus of training.For instance; If language model is set up unit 520 and will be set up n and connect language model; Language model is set up unit 520 language material in can the statistics corpus to produce word frequency information; And (Maximum Likelihood Estimation MLE) estimates that the probability that n connects language model representes, produces n in view of the above and connects language model to utilize maximal possibility estimation.
Just because of language model is set up unit 520 is to set up language model based on the relevance between words and the front and back words; Therefore text conversion system 500 is when utilizing language model to handle the transfer problem of one-to-many; Just can select the pairing higher words of relevance that occurs jointly, thereby produce correct suitable text conversion result.
Words table of comparisons updating block 530 is coupled to storage element 110.Words table of comparisons updating block 530 can utilize the words table of comparisons existing in the storage element 110, and the mode of prospecting with network produces the Parallel Corpus that correspondence is come source language and target language automatically, and upgrades the content of the words table of comparisons according to Parallel Corpus.
Particularly, words table of comparisons updating block 530 is prospected technology obtain originating language data collection and target language data collection through network.Wherein, the language material that language data is concentrated can be speech, example sentence, literal paragraph, article fragment, or article or the like.Then; According to the FJZ table of comparisons existing in the storage element 110; Always source language data set and target language data collection are found out mutual corresponding come source language language material and target language language material respectively, are used to source language language material and target language language material again and produce Parallel Corpus.For instance, words table of comparisons updating block 530 always source language data set and target language data is concentrated, and taking out one piece individually maybe be at the article of describing similar incidents, and in these two pieces of articles, selects similar and maybe be to two example sentences of row.Then, utilize these two example sentences calculate these two pieces of articles to the row probable value, thereby judge that whether these two pieces of articles are high-quality to the row article.If high-quality to the row article, aforementioned two example sentences to row then can be used as one group of data in the Parallel Corpus.By the way, words table of comparisons updating block 530 just can produce Parallel Corpus, and this Parallel Corpus will be stored to storage element 110.
In addition, words table of comparisons updating block 530 can expand the content of the words table of comparisons according to Parallel Corpus.At length say; Words table of comparisons updating block 530 from Parallel Corpus stored each other to row and be respectively two example sentences of source language and target language and find out corresponding words (vocabulary that for example, belongs to come source language and target language and contrast to get up to have difference respectively promptly is regarded as the words of mutual correspondence).If the corresponding words of finding out does not come across the words table of comparisons, 530 of words table of comparisons updating blocks can add the words table of comparisons to expand the content of the words table of comparisons with it.
In one embodiment; Supposing to come source language is that simplified form of Chinese Character and target language are Chinese-traditional; If the number of times that in Parallel Corpus, belongs to the words " beer on draft " of simplified form of Chinese Character and correspond to each other with the words " draft beer " that belongs to Chinese-traditional arrives a predetermined number (for example 10), words table of comparisons updating block 530 can judge that just " beer on draft " and " draft beer " is the words of changing each other.Words table of comparisons updating block 530 can be set up index (for example setting up reverse indexing (inverted index)) for these words of changing each other.Thus, words table of comparisons updating block 530 just can upgrade the content of the words table of comparisons according to words contrast relationship and index, or sets up a new words table of comparisons automatically.
The words table of comparisons that is upgraded or set up by 530 of words table of comparisons updating blocks can reflect the term difference of coming between source language and the target language, and can provide number of words inconsistent words corresponding relation.Guarantee that in view of the above text conversion system 500 can produce preferable transformation result.
In one embodiment of this invention; When being used in mobile devices such as mobile phone, PDA or e-book when text conversion system 500; Because the size of speed, storer and the storage area of the processor of mobile device all has more restriction, in order to accelerate the speed of text conversion, language model is set up unit 520 after setting up language model; With the data volume of managing to reduce language model, thus the treatment effeciency of lifting text conversion system 500.
For instance, language model is set up unit 520 after setting up language model in the above described manner, only can be with comprising the sentence of wrong one-to-many words that changes easily, and the sentence that comprises the higher words of the frequency of occurrences remains.
In addition, to being retained each sentence of getting off, language model set up unit 520 therefrom intercepting go out necessary sentence fragment, with further reduction data volume.For example, it is the center that language model is set up unit 520 words higher with the frequency of occurrences or one-to-many, the individual word of n before and after taking out (for example 3) form than the sub-fragment of short sentence, the words that does not belong in the above-mentioned sentence fragment then can be deleted.For example, assume that the language model includes "Now he is just six miles away mine back" this Traditional Chinese sentence, where "li" as the high frequency words.Language modeling unit 520 sets the language model "just six miles away and now he's mine back to" streamline Traditional Chinese sentence as "six miles away from coal."
Moreover language model is set up unit 520 can also convert scale-of-two archives (binary file) to passing through the language model of simplifying, the processing speed when using language model to promote.
Similarly, in order to reduce the words table of comparisons to be compared and searched the time that is spent, words table of comparisons updating block 530 can use hash functions (hash function) handle the words table of comparisons, thereby reaches the purpose of accelerating comparison speed.
Fig. 6 is the calcspar according to the text conversion system shown in the another embodiment of the present invention.See also Fig. 6, text conversion system 600 comprises input block 610, storage element 620, converting unit 630, and output unit 640.Text conversion system 600 can be applicable to mobile phone, personal digital assistant, e-book, various computer/computing machine or mobile Internet access device.Perhaps, text conversion system 600 also can embed browser, document process software, or among the website service.Text conversion system 600 converts target language in order to the literal paragraph that will meet to come source language, at this not to coming source language and target language to limit.
In the present embodiment, input block 610 is in order to obtain a source language words from the literal paragraph that meets to come source language.
Storage element 620 couples input block 610.Storage element 620 for example is various storage devices such as hard disk, solid state hard disc or flash memory; In order to a words table of comparisons to be provided; This words table of comparisons writes down the words corresponding relation of source language and target language, and source language words corresponds to few candidate target language words.Because the words table of comparisons in the words table of comparisons in the storage element 620 and the storage element 110 of Fig. 1 is same or similar, so repeat no more at this.
Converting unit 630 couples input block 610, storage element 620 and output unit 640.How converting unit 630 is in order to convert the source language words in the literal paragraph into the target language words with reference to several language data sources with decision.Export to convert the literal paragraph of target language to by output unit 640 again.
In another embodiment, text conversion system 600 more comprises the communication unit (not shown).Communication unit couples converting unit 630, in order to link to each language data source through communication network.
Below will come the detailed operation mode of comment converting system 600 with Fig. 7, please consult Fig. 6 and Fig. 7 simultaneously.
At first shown in step 710, input block 610 is obtained a source language words from the literal paragraph that meets to come source language.Follow in step 720 the words table of comparisons that provides storage element 620 to be write down.The words table of comparisons writes down the words corresponding relation of source language and target language, and source language words corresponds to few candidate target language words.
Shown in step 730; Converting unit 630 according to source pairing each candidate target language words of language words and with the literal paragraph in several associated characters and words of being formed of at least one front and back words respectively several language data sources relevance appears jointly, from above-mentioned candidate target language words, choose one as the target language words that will convert to.
For instance, the language data source for example is webpage, network article and language database or the like.Converting unit 630 language models capable of using calculate respectively each candidate target language words and with the literal paragraph in several associated characters and words of being formed of at least one front and back words, in above-mentioned language data source relevance appears jointly respectively.Wherein, language model can be that n connects language model, doubly-linked language model, or other any vocabulary frequency tables of comparisons with contrast frequency of speech and speech, does not limit at this.Because it is similar with previous embodiment to calculate the mode that occurs relevance jointly, so repeat no more at this.
Another kind of embodiment; Said several associated characters and words relevance occurs in several language data sources respectively jointly in the converting unit 630; Can be through a Search engine or a query interface; From several language data sources (webpage, network article and language database etc.), search and add up quantity or frequency that each associated characters and words occurs, and select to occur the higher associated characters and words of quantity/frequency as the target language words that will convert to.
Converting unit 630 is selected the highest corresponding candidate target language words that occurs relevance jointly to be used as the target language words, and in the literal paragraph, is changed source language words with the target language words in all candidate target language words.Convert the literal paragraph of target language again to by output unit 640 outputs.
As stated; After text conversion system 600 meets to come the literal paragraph of source language in reception; To arrive the relevant great deal of language data sources such as webpage, network article and language database of Web search; And then how always decision select the target language words that really will convert in the pairing at least one candidate target language words of source language words, to produce preferable text conversion result.
What must specify is, though be in the above-described embodiments with simplified form of Chinese Character as coming source language and describe as target language with Chinese-traditional, the present invention is not as limit.In other embodiments, coming source language can be Chinese-traditional, and target language is a simplified form of Chinese Character.Perhaps, coming source language is Chinese, and target language is English.The present invention does not limit the kind of coming source language and target language.
In sum; Word conversion method of the present invention and system are when converting literal paragraph origin source language into target language; Can handle the term difference between the different language automatically; And to the corresponding situation of the words of one-to-many, also can according to pairing each candidate target language words and with the literal paragraph in a plurality of associated characters and words of being formed of at least one front and back words relevance appears jointly, therefrom select the words that is suitable for converting to most automatically and correctly.Thus, can significantly promote the correctness that the literal paragraph is converted into different language.
Though the present invention discloses as above with embodiment; Right its is not that any person of ordinary skill in the field is not breaking away from the spirit and scope of the present invention in order to qualification the present invention; When doing a little change and retouching, so protection scope of the present invention is when being as the criterion with what claim defined.

Claims (20)

1. a Word conversion method converts a target language in order to a literal paragraph that will meet to come source language, and wherein this article field falls to comprising a plurality of sources language words, it is characterized in that this method comprises following step:
The one words table of comparisons is provided, and this comes the words corresponding relation of source language and this target language this words table of comparisons record;
This article field is dropped into row one disconnected speech processing and obtained a plurality of disconnected speech results;
Compare those disconnected speech results and this words table of comparisons; To judge that each those source language words belongs to one first kind and one second kind the two one of them; Wherein belong to the only corresponding simple target language of the source language words words of this first kind, and belong to the corresponding a plurality of candidate target language words of source language words of this second kind;
According to the words corresponding relation that this words table of comparisons is write down, the source language words that in this article field falls, will belong to this first kind converts this corresponding target language words to; And
The source language words that will belong to this second kind; Relevance appears jointly according to pairing each those candidate target language words and with this article field falls a plurality of associated characters and words that at least one front and back words formed, from those candidate target language words, choose one as this target language words that will convert to.
2. Word conversion method according to claim 1; It is characterized in that; Those associated characters and words that this at least one front and back words is formed in wherein falling according to pairing each those candidate target language words and with this article field relevance appears jointly, the step that from those candidate target language words, chooses one as this target language words comprises:
Those associated characters and words that utilize a language model to calculate each those candidate target language words respectively and formed with this at least one front and back words relevance appears jointly;
In those candidate target language words, select the highest corresponding candidate target language words that occurs relevance jointly to be used as this target language words; And
, this article field changes this source language words in falling with this target language words.
3. Word conversion method according to claim 1; It is characterized in that; Those associated characters and words that this at least one front and back words is formed in wherein falling according to pairing each those candidate target language words and with this article field relevance appears jointly, the step that from those candidate target language words, chooses one as this target language words comprises:
Those associated characters and words that utilize a language model to calculate each those candidate target language words respectively and formed with this at least one front and back words relevance appears jointly;
From those candidate target language words; Select a plurality of higher candidate target language words that occur relevance jointly, wherein those higher candidate target language words that occur relevance jointly are that its pairing relevance that occurs jointly is greater than one first threshold value; And
Utilize a dictionary of supporting this target language and a reference language; Those higher each words that occurs the candidate target language words of relevance jointly with each respectively; Be translated as a corresponding reference language word; And from this dictionary and respectively should correspondence reference language word; Judge each those higher candidate target language words that occurs relevance jointly respectively should correspondence reference language word between relevance, be used as this target language words with the highest candidate target language words of relevance of selecting corresponding reference language word.
4. Word conversion method according to claim 3 is characterized in that, judges wherein that respectively the step of the relevance between should correspondence reference language word comprises:
According to respectively should correspondence reference language word in this dictionary in the frequency of occurrences of a plurality of grammatical interpretations, with the relevance of decision between respectively should correspondence reference language word.
5. Word conversion method according to claim 2 is characterized in that, more comprises following step:
Through training at least one corpus to set up this language model.
6. Word conversion method according to claim 1 is characterized in that, more comprises following step:
Prospect to obtain a source language data collection and a target language data collection through network;
Find out mutual corresponding one respectively from this source language data collection and this target language data collection and come a source language language material and a target language language material;
Utilize this to come source language language material and this target language language material to produce a Parallel Corpus; And
Expand the content of this words table of comparisons according to this Parallel Corpus.
7. a text conversion system converts a target language in order to a literal paragraph that will meet to come source language, and wherein this article field falls to comprising a plurality of sources language words, it is characterized in that this system comprises:
One storage element, in order to store a words table of comparisons, this comes the words corresponding relation of source language and this target language this words table of comparisons record;
One taxon; Couple this storage element; This article field is dropped into the disconnected speech of row one handle and obtain a plurality of disconnected speech results, and compare those disconnected speech results and this words table of comparisons, belong to one first kind and one second kind the two one of them to judge each those language words of originating; Wherein belong to the only corresponding simple target language of the source language words words of this first kind, and belong to the corresponding a plurality of candidate target language words of source language words of this second kind;
One converting unit; Couple this storage element and this taxon; According to the words corresponding relation that this words table of comparisons is write down; The source language words that in this article field falls, will belong to this first kind converts this corresponding target language words to; And will belong to the source language words of this second kind, relevance appears jointly according to pairing each those candidate target language words and with this article field falls a plurality of associated characters and words that at least one front and back words formed, from those candidate target language words, choose one as this target language words that will convert to; And
One output unit couples this converting unit, and this article field that has converted this target language in order to output to falls.
8. text conversion according to claim 7 system is characterized in that wherein, this system more comprises:
One input block couples this storage element, meets this with reception and comes this article field of source language to fall.
9. text conversion according to claim 7 system; It is characterized in that; Wherein, this converting unit utilize a language model to calculate each those candidate target language words respectively and those associated characters and words of being formed with this at least one front and back words relevance appears jointly; In those candidate target language words, select the highest corresponding candidate target language words that occurs relevance jointly to be used as this target language words; And, this article field changes this source language words in falling with this target language words.
10. text conversion according to claim 7 system; It is characterized in that, wherein this converting unit utilize a language model to calculate each those candidate target language words respectively and those associated characters and words of being formed with this at least one front and back words relevance appears jointly; From those candidate target language words, select a plurality of higher candidate target language words that occur relevance jointly, wherein those higher candidate target language words systems that occur relevance jointly are that its pairing relevance that occurs jointly is greater than one first threshold value; And; Utilize a dictionary of supporting this target language and a reference language; Those higher each words that occurs the candidate target language words of relevance jointly with each respectively; Be translated as a corresponding reference language word; And from this dictionary and respectively should correspondence reference language word, judge each those higher candidate target language words that occurs relevance jointly respectively should correspondence reference language word between relevance, be used as this target language words with the highest candidate target language words of relevance of selecting corresponding reference language word.
11. text conversion according to claim 10 system; It is characterized in that; Wherein this converting unit more comprise in order to according to respectively should correspondence reference language word in this dictionary in the frequency of occurrences of a plurality of grammatical interpretations, with the relevance of decision between respectively should correspondence reference language word.
12. text conversion according to claim 7 system; It is characterized in that; Wherein this storage element more comprises and stores at least one corpus; And this literal converting system more includes a language model and sets up the unit, couples this storage element, in order to through the training this at least one corpus to set up this language model.
13. text conversion according to claim 7 system is characterized in that, more comprises:
One bilingual words table of comparisons updating block couples this storage element, prospects to obtain a source language data collection and a target language data collection through network; Find out mutual corresponding one respectively from this source language data collection and this target language data collection and come a source language language material and a target language language material; Utilize this to come source language language material and this target language language material to produce a Parallel Corpus; And, the content that expands this words table of comparisons according to this Parallel Corpus.
14. a Word conversion method, is characterized in that this method comprises in order to carry out coming the text conversion of a source language and a target language:
From meet this literal paragraph that comes source language, obtain a source language words;
The one words table of comparisons is provided, and this comes the words corresponding relation of source language and this target language this words table of comparisons record, and this source language words corresponds to few candidate target language words; And
A plurality of associated characters and words that at least one front and back words is formed in falling according to pairing respectively this at least one candidate target language words and with this article field; Respectively a plurality of language datas source relevance appears jointly, from this at least one candidate target language words, choose one as the target language words that will convert to.
15. Word conversion method according to claim 14; It is characterized in that; Wherein, Those associated characters and words of being formed according to pairing respectively this at least one candidate target language words and with this at least one front and back words respectively those language data sources relevance appears jointly, the step that from this at least one candidate target language words, chooses one as this target language words comprises:
Those associated characters and words that utilize a language model to calculate this at least one candidate target language words respectively respectively and formed with this at least one front and back words relevance occurs in those language data sources respectively jointly;
In this at least one candidate target language words, select the highest corresponding candidate target language words that occurs relevance jointly to be used as this target language words; And
, this article field changes this source language words in falling with this target language words.
16. Word conversion method according to claim 14 is characterized in that, wherein, those language data sources comprise webpage, network article and language database.
17. a text conversion system, is characterized in that this system comprises in order to carry out coming the text conversion of a source language and a target language:
One input block is obtained a source language words from meet this literal paragraph that comes source language;
One storage element couples this input block, and a words table of comparisons is provided, and this comes the words corresponding relation of source language and this target language this words table of comparisons record, and this source language words corresponds to few candidate target language words;
One converting unit; Couple this input block and this storage element; A plurality of associated characters and words that at least one front and back words is formed in falling according to pairing respectively this at least one candidate target language words and with this article field; Respectively a plurality of language datas source relevance appears jointly, from this at least one candidate target language words, choose one as the target language words that will convert to; And
One output unit couples this converting unit, and this article field that has converted this target language in order to output to falls.
18. text conversion according to claim 17 system; It is characterized in that; Wherein, Those associated characters and words that this converting unit utilizes a language model to calculate this at least one candidate target language words respectively respectively and formed with this at least one front and back words relevance occurs at this of those language data sources respectively jointly; In this at least one candidate target language words, select the highest corresponding candidate target language words that occurs relevance jointly to be used as this target language words; And,, this article field changes this source language words in falling with this target language words.
19. text conversion according to claim 17 system is characterized in that wherein, those language data sources comprise webpage, network article and language database.
20. text conversion according to claim 17 system is characterized in that wherein, this system more comprises a communication unit, couples this converting unit, in order to link to those language data sources through communication network.
CN201010576958.7A 2010-12-02 2010-12-02 Character conversion method and system Active CN102486770B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201010576958.7A CN102486770B (en) 2010-12-02 2010-12-02 Character conversion method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201010576958.7A CN102486770B (en) 2010-12-02 2010-12-02 Character conversion method and system

Publications (2)

Publication Number Publication Date
CN102486770A true CN102486770A (en) 2012-06-06
CN102486770B CN102486770B (en) 2014-09-17

Family

ID=46152264

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201010576958.7A Active CN102486770B (en) 2010-12-02 2010-12-02 Character conversion method and system

Country Status (1)

Country Link
CN (1) CN102486770B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104794110A (en) * 2014-01-20 2015-07-22 腾讯科技(深圳)有限公司 Machine translation method and device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020198701A1 (en) * 2001-06-20 2002-12-26 Moore Robert C. Statistical method and apparatus for learning translation relationships among words
US20040024581A1 (en) * 2002-03-28 2004-02-05 Philipp Koehn Statistical machine translation
CN101295298A (en) * 2007-04-23 2008-10-29 株式会社船井电机新应用技术研究所 Translation system, translation program, and bilingual data generation method
CN101707873A (en) * 2007-03-26 2010-05-12 谷歌公司 Large language models in the mechanical translation

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020198701A1 (en) * 2001-06-20 2002-12-26 Moore Robert C. Statistical method and apparatus for learning translation relationships among words
US20040024581A1 (en) * 2002-03-28 2004-02-05 Philipp Koehn Statistical machine translation
CN101707873A (en) * 2007-03-26 2010-05-12 谷歌公司 Large language models in the mechanical translation
CN101295298A (en) * 2007-04-23 2008-10-29 株式会社船井电机新应用技术研究所 Translation system, translation program, and bilingual data generation method

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104794110A (en) * 2014-01-20 2015-07-22 腾讯科技(深圳)有限公司 Machine translation method and device
CN104794110B (en) * 2014-01-20 2018-11-23 腾讯科技(深圳)有限公司 Machine translation method and device

Also Published As

Publication number Publication date
CN102486770B (en) 2014-09-17

Similar Documents

Publication Publication Date Title
TWI434187B (en) Text conversion method and system
US11416679B2 (en) System and method for inputting text into electronic devices
US10402493B2 (en) System and method for inputting text into electronic devices
US11216504B2 (en) Document recommendation method and device based on semantic tag
US20180341871A1 (en) Utilizing deep learning with an information retrieval mechanism to provide question answering in restricted domains
US20160026258A1 (en) Virtual keyboard input for international languages
CN104199965A (en) Semantic information retrieval method
CN104008126A (en) Method and device for segmentation on basis of webpage content classification
CN111428494A (en) Intelligent error correction method, device and equipment for proper nouns and storage medium
CN102214166A (en) Machine translation system and machine translation method based on syntactic analysis and hierarchical model
US20040186706A1 (en) Translation system, dictionary updating server, translation method, and program and recording medium for use therein
CN107038225A (en) The search method of information intelligent retrieval system
CN111832299A (en) Chinese word segmentation system
Jain et al. Context sensitive text summarization using k means clustering algorithm
CN111950301A (en) English translation quality analysis method and system for Chinese translation and English translation
KR20150083961A (en) The method for searching integrated multilingual consonant pattern, for generating a character input unit to input consonants and apparatus thereof
CN104281275A (en) Method and device for inputting English
CN109885641A (en) A kind of method and system of database Chinese Full Text Retrieval
CN103064847A (en) Indexing equipment, indexing method, search device, search method and search system
CN102135957A (en) Clause translating method and device
CN102486770B (en) Character conversion method and system
CN115438048A (en) Table searching method, device, equipment and storage medium
CN114996455A (en) News title short text classification method based on double knowledge maps
CN108197118A (en) A kind of method that automatic indexing and retrieval are carried out using computer system
CN109947947B (en) Text classification method and device and computer readable storage medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant