CA2216387C - Integrated multilingual browser - Google Patents

Integrated multilingual browser Download PDF

Info

Publication number
CA2216387C
CA2216387C CA002216387A CA2216387A CA2216387C CA 2216387 C CA2216387 C CA 2216387C CA 002216387 A CA002216387 A CA 002216387A CA 2216387 A CA2216387 A CA 2216387A CA 2216387 C CA2216387 C CA 2216387C
Authority
CA
Canada
Prior art keywords
language
document
browser
documents
html
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime
Application number
CA002216387A
Other languages
French (fr)
Other versions
CA2216387A1 (en
Inventor
Mary A. Flanagan
John A.. Lammers
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Microsoft Corp
Original Assignee
Microsoft Technology Licensing LLC
America Online Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Microsoft Technology Licensing LLC, America Online Inc filed Critical Microsoft Technology Licensing LLC
Publication of CA2216387A1 publication Critical patent/CA2216387A1/en
Application granted granted Critical
Publication of CA2216387C publication Critical patent/CA2216387C/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q99/00Subject matter not provided for in other groups of this subclass
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/131Fragmentation of text files, e.g. creating reusable text-blocks; Linking to fragments, e.g. using XInclude; Namespaces
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/14Tree-structured documents
    • G06F40/143Markup, e.g. Standard Generalized Markup Language [SGML] or Document Type Definition [DTD]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/151Transformation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/221Parsing markup language streams
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/42Data-driven translation
    • G06F40/47Machine-assisted translation, e.g. using translation memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation

Abstract

The disclosed system translates into dif-ferent languages HTML, documents (16) avail-able through the World Wide Web. HTML
documents (16) are translated by machine trans-lation software (10) bundled in a browser (12). Alternatively, documents are retrieved as needed, translated, and stored on a Web server so user requests are serviced with a document that has been translated from a different lan-guage. The disclosed invention expands usage of the Internet for non-English speakers.

Description

INTEGRATED MULTILINGUAL SROWSER
BACKGROUND AND SUMMARY OF THE INVENTION
The present invention relates generally to the field of electronic communication over a computer network. Particularly, the present invention relates to the expansion of mufti-lingual electronic communication through translation services for documents and messages available through the Internet.
The recent surge in media attention to the Internet, and especially the World Wide Web, coupled with the continuing growth in home PC ownership have resulted in a growing diversity of the Internet user population. No longer is the Internet the province of software experts; thousands of novice users have begun to come online each day.
Software like CompuServe's Web Browser lets users quickly connect to and find useful content online. This phenomenon is not restricted to the United States or to English-speaking countries. Grrowth in online usage in Europe and Asia is increasing even more quickly than in the U.S.
While interest in the online world is at a peak, a significant obstacle exists to broad usage of the Internet for non-English speakers. The vast majority of Internet content is in English, and is therefore inaccessible to users with other native languages.
Translation of Internet documents by a human translator is not a practical solution for two reasons. First, human translation is costly and slow. A translator can typically produce 300-400 words per hour at costs of 12ø per word or more. Second, in order to have a translator convert Internet documents to the user's native language, the user would have to download every document he was interested in to provide it to the translator. This is a time-consuming process, and if the user knows no English, he will not even be able to assess the relevance of the document before downloading it. This would result in wasted time and translation costs since inevitably, some of the documents selected will not prove to be worthwhile.
The present invention allows non-English speaking Internet users to access and understand information available from the Word Wide Web and related sources.
Language translation software (known as machine translation, or MT) is combined with Internet software to allow non-English speaking Internet users to quickly generate translations of online text. The process is automated and therefore, less costly and time-consuming than human translation.
Accordingly, the present invention provides a method for translating a plurality of 1o documents comprising codes and data characters in a first language, comprising the steps o~
specifying a second language;
transmitting a request for a first document from a browser to a web server;
transmitting said first document to a preprocessor, said preprocessor adapted to insert markers around said codes in said first document;
15 inserting markers around said codes in said first document;
transmitting said first document to a language translator, said language translator adapted to leave said marked codes untranslated;
translating said data characters in said first language to data characters in said second language using said language translator;
20 transmitting said first translated document to a postprocessor, said postprocessor adapted to remove said markers from said first translated document;
removing said markers from said first translated document;
displaying said first translated document at said browser;
selecting in said first translated document a link to a second document in said first 25 language;
transmitting a request for said second document from a browser to a web server;
transmitting said second document to said preprocessor; inserting markers around codes in said second document;
transmitting said second document to said language translator;
3o translating said data characters in said first language to data characters in said second language using said language translator;
transmitting said second translated document to said postprocessor;
removing said markers from said second translated document;
and displaying said second translated document at said browser.
2 The present invention also provides a document translation system for translating documents comprising codes and data characters in a first language, said system comprising:
a browser for requesting said documents from a web server in accordance with links in said documents and for specifying a second language;
a preprocessor for marking said codes in said documents;
a language translator for translating into said second language said data characters in said preprocessed documents; and a postprocessor for unmarking said codes in said translated documents.
The present invention also provides a method for translating documents, comprising 1o the steps o~
defining in an original language a document containing display and reference codes and data characters exclusive of said display and reference codes;
storing said document at a server;
selecting a target language, said target language specified by a user of a browser;
15 requesting said document from said server, said document requested by said user in accordance with a link;
preprocessing said document to insert markers around said reference codes;
translating said data characters in said document from said original language to said 2o target language;
postprocessing said document to remove said markers from around said reference codes; and transmitting said document in said target language to a browser.
In a further aspect, the present invention provides a system for automated translation 25 of HTML documents comprising:
a browser for requesting and displaying HTML documents;
a server for processing browser requests for HTML documents;
a connection between said browser to said server for transmitting requests from said browser to said server and HTML documents from said server to said browser;
3o a request from said browser for a HTML document in a source language, said HTML
document requested in accordance with a link to said HTML document in said source language;
2a a target language specified in accordance with said browser; and machine translation software for translating said HTML document in said source language to a HTML
document in said target language.
The present invention also provides a method for providing HTML documents in a plurality of languages comprising the steps of:
retrieving from a server a HTML document in a source language, said HTML
document selected in accordance with a link;
determining a target language, said target language specified by a user of a browser;
translating said HTML document in said source language to a HTML document in 1o said target language; and displaying said HTML document in said target language at said browser.
The present invention also provides a multilingual browser comprising:
a browser for requesting and displaying HTML documents, said browser adapted for specification of a target language; and 15 machine translation software integrated into said browser, said machine translation software adapted to translate a HTML document in a source language to a HTML
document in said target language, in accordance with a user of said browser selecting a link to said HTML document in said source document.
The present invention also provides a system for automated translation of HTML
2o documents, comprising:
a plurality of web servers adapted to store a plurality of HTML documents in a source language;
a personal computer equipped with a browser for accessing HTML documents;
a server for processing requests from said browser at said personal computer for access 25 to said plurality of HTML documents;
a target language specified by a user of said personal computer;
a request for one of said plurality of HTML documents in said source language;
a language translator for translating said requested HTML document in said source language to a HTML document in said target language; and 3o a display at said personal computer for displaying said HTML document in said target language.
Advantages of the present invention are explained further in relation to the following detailed description of the invention, drawings, and claims.
2b Figures IA and IB c;omprise a screen shot of a World Wide Web page;
Figure 2 is an example of a hypertext document;
Figure 3 is an example of a hypertext document preprocessed according to the method of the present invention;
Figure 4 illustratES a system for performing machine translation;
Figure SA and SB comprise axe example of a preprocessed hypertext document translated according to the rnethod of the present invention;
to Figure 6 is an example of a translated hypertext document postprocessed according to the method of the present invention;
Figure 7A and 7B comprise a screen shot of a World Wide Web page that has been translated according to tl~e rr~etlxod of tb,e pxesexxt izzvention;
Figure 8 is a diagrammatic view of one embodiment of the present invention in which machine translation is inte~-ated into a Web browser; and 2c Figure 9 is a diagrammatic view of one embodiment of the present invention in which pre-translated Web pages are accessible from a server.
DETAIL DESCRIPTION OF PREFERRED EMBODIMENTS) Although the detailed description of a preferred embodiment focuses on automatic translation of World Wide Web pages, the concept is adaptable to documents obtained from other sources.
The World Wide Web (WWW or the Web) is a distributed information system that may be accessed through a number of sources. It is comprised of software and a set of protocols and conventions. Information on the Web may be accessed using a browser program such as CompuServe's Web Browser. Browsers allow users to read documents and to locate documents from other sources. They present an interface for interacting with the system and they process requests on behalf of the user.
Information providers on the WWW make their information available through programs that understand the HyperText Transfer Protocol (HTTP}. Browsers assist users in 'visiting" Web sites where information is stored. Information is displayed in pages of text and graphics called '~Veb Pages." An example of a Web page as viewed through CompuServe's Web Browser is provided in Figures 1A and 1B. The Web page shown in Figures 1A
and 1B
contains both text 14, 18 and graphics 10, 12, 16. The title bar 20, menu options 22, buttons 24, and document information 26 appearing at the top of the screen are part of the browser used to view the Web page.
In most cases, information providers make information available through a Web server.
The server responds to information requests by delivering the requested information to the user's browser for viewing. Some providers may make their information available through a
3 WO 97!18516 PCT/US96/18102 proxy server that converts information in one format to the format expected and understood by the browser.
Documents available on the WWW and displayed by browsers are hypertext , documents. Hypertext is text that contains references {or "links,"
'~.yperlinks," or 'dot spots") to other documents. The reference is similar to a footnote except the referenced document may be accessed directly from the original document. The related document may be viewed by selecting or clicking the mouse on the reference. The process of selecting hyperlinks to view referenced documents may be referred to as "traversing the hyperlinks."
Unlike a footnote, references usually do not appear as shorthand descriptions of related documents. Instead, references may be indicated by a combination of graphics, different fonts, different colors for the text, underlining, the mouse pointer turning into a hand, etc. The referenced documents may reside on different computers at different Web sites.
Hypertext documents are written in a 'markup language" call Hypertext Markup Language {HTML). HTNIL actually refers to both a document type and the markup language IS that represents instances of the document type. A hypertext document contains general semantics appropriate for representing display or presentation characteristics as well as information from a wide ranges of domains. A hypertext document consists of a sequence or stream of characters that comprise both data characters and markups. Markups are syntactically delimited characters (sucli as "<," ">," '~," etc.) added to the data characters to define the document's structure. Markups thus have special meanings and may represent such things as hypertext, news, mail, documentation, menus of options, and in-line graphics.
Markups may be combined with other characters or related values to create codes that also have special meaning. Data characters are those characters in the document that are not codes.
4 Figure 2 is the hypertext document that describes the Web page shown in Figures 1A
and 1B. Figure 2 shows the markups and related words (that comprise codes) as well as data characters that may appear in a hypertext document. For example, the characters "<" and ">"
appearing throughout the document are markups. The characters "<" and ">"
combined with the word "head" ("<head>") 10 may be considered a code. Finally, the text '21L,T Home" 10 that is not surrounded by markups or codes may be considered data characters.
As indicated by the brief description, H'FML documents have a well-defined and documented structure defined by a grammar. The codes in a HTML document convey important information regarding both the display or presentation of the document itself as well as related references and commands. Display and presentation information may include color information, information about graphics that appear on the .page, information about text that appears on the page, etc. A HTML document is structured as a series of elements that are identified by the language markups and codes. A document includes a head (consisting of a title and other optional elements) and a body that is a text flow of paragraphs, lists, images, and other elements. The various parts of the document may be identified by looking at the markups or codes in the document. For example, referring again to Figure 2 which shows the hypertext for Figures 1A and 1B, the document head contains the title "NLT
Home" 10. An image contained in the document is identified in the line "<br><img src--'~le:/l/n~/iowebsrv/server/8100--~1.1/server 1/image/ntl jpg"
height=60 width=640></center>" 12.
As may be apparent, the process of translating a ITTML document requires examination of each character in document. Characters may be examined individually and in combination to determine whether they are markups, codes, or data characters.
To process a document, the processing software examines the character stream that comprises the
5 document. The steps needed to translate a HTML, document from one language to another may be summarized as follows:
Step 1. Preprocess the HTML document by placing boundary markers around , H'IT~IL codes to be preserved during the translation process. The translation software recognizes the boundary markers and does not translate text and symbols appearing between the markers.
Step 2. Translate the preprocessed HTML document from the original language to the target language.
Step 3. Postprocess the translated HTML document to remove the boundary markers.
Step 1. The codes in a HTMI. document convey important information describing the characteristics of the Web page. Referring again to Figure 2, an example of the type of information contained in a hypertext document is shown. Certain information contained in the document of Figure 2 may be interpreted by a Web browser so that to the browser user, the images shown in Figures 1A and 1B appear. Certain information in the hypertext document is preserved during the translation process so that the translated page has, in general, the same appearance and behavior as the original page. Because HTML documents have a well-defined and known structure described by a grammar, automated translation of a I-fIZVIL document is possible. The codes in the document may be discerned by the preprocessing software. Special boundary markers placed in the document by the preprocessing ~ software indicate to ' the translation software that the intervening text should not be translated.
Consequently, the resulting page may have the same appearance and behavior as the original page.
Referring to Figure 3, an example of a preprocessed HTML document is shown.
The I~TML, document of Figure 3 is the preprocessed version of the HTMI, document shown in
6 Figure 2. In this example, the boundary markers used to identify the HTML
codes are the character pairs "{." and ".}". Any character or character combination that does not normally occur in text may be used as a boundary marker. The line that appeared as "<head><title>NLT Home<title><head>" in Figure 2 (10) is preprocessed in Step 1 to the line "{.<head>.}{.<title>.}NLT Home{.<title>.}{.<head>.}" in Figure 3 (10). Other Lines in the document are preprocessed similarly.
Step 2. Machine Translation (MT) software performs the translation of text from one language to another language. There are many commercially available MT
software packages.
Figure 4 is an illustration of a system in which MT software 10 takes as input text in one language 12 and generates a rough draft translation of the text in another language 14 using an electronic dictionary 16 and a set of linguistic and/or statistical rules encoded in the program 18. MT software can perform language conversion operations very quickly; in some cases, at speeds ofup to 3,000 words per minute. The translated texts are not high quality translations, but they are usually adequate for understanding what the document is about.
Referring to Figures SA and 5B, an example of a translated H'TML document is shown. The HTMI, document of Figures SA and SB is the translated version of the preprocessed HTML document shown in Figure 3. As described above, the boundary markers used to identify the HTML codes are the character pairs "{." and ".}".
Consequently, the MT
software ignores all text that falls between the boundary markers. Data characters that are not surrounded by boundary markers are translated by the MT software. The preprocessed line that appeared as "{.<head>.}{.<title>.}NLT Home{.<title>.}{.<head>.}" in Figure 3 (10) is translated in Step 2 to the line "{.<head>.}{.<title>.}NLT
Maison{.<title>.}{.<head>.}" in Figure SA (10).
7 Step 3. In the final step, postprocessing software removes boundary markers from the translated document. Referring to Figure 6, an example of a postprocessed HTML
document is shown. The HTML document of Figure 6 is the postprocessed version of the translated , HTML document shown in Figures 5A and 5B. As described above, the boundary markers used to identify the HTML codes are the character pairs "{." and ".)". During postprocessing, these boundary markers are removed. The translated line that appeared as "{.<head>.){.<title>.}NLT Maison{.<title>.}{.head>.}" in Figure 5A (10) is postprocessed in Step 3 to the Iine "<head><title>NLT Maison<title><head>" in Figure 6 {
10). The postprocessed ITTML document of Figure 6 may then be displayed by the browser as shown in Figures 7A and 7B.
Figure 8 is a diagrammatic view of one embodiment of the present invention in which machine translation is integrated into a Web browser. MT software 10 may be combined with a browscr 12 to allow the user 14 to rapidly and automatically translate online documents from the World Wide Web 1b into his native language. The MT software 10 may be bundled with the browser 12 to form an integrated multilingual browser. The user 14 of the multilingual browser I6 selects the desired target language, (e.g. French if the user speaks French), and the Web document retrieved by the browser I8 may be rapidly translated on-the fly with a mouse click. The Web Browser I2 then displays for the user 14 the translated document 20. Optionally, the user may be able to update and edit parts of the MT software's electronic dictionaries to include terminology common to the Web sites he visits.
Although a document may be translated at the time that a user requests access to the document, a document may also be 'ire-translated" and stored in a cache for later retrieval before a user seeks access to it. Documents that have been accessed at least once may also be stored following translation. The advantage of storing documents that have been translated is
8 that delivery time to the user may be reduced. Although storing documents requires disk space, it may represent a better use of system resources because documents that are accessed frequently are translated once rather than every time they are accessed.
Figure 9 is a diagrammatic view of an alternative implementation in which pre-translated Web pages are stored on a Web server 14. The translation software resides on a translation server 14 (possibly the same machine as the Web server). Popular Web pages 24 are pre-translated and stored in a cache 28, with additional pages being added as they are requested by users 20. The cache is a dynamic storage device with a finite capacity. New, pretranslated pages are added to the cache, but pages will also be removed from the cache if they are used infrequently or ifthere are constraints on storage capacity.
In accessing the system, a user 10, sends to the Web Server 14 a request for a specific page in a specific language 12. The Web Server 14 then sends a request to get the desired page 16. The method for servicing the request depends on where the page is located. If the page lias been pre-translated 24 and stored in the cache of pages in multiple languages 28, it is retrieved from the cache 26 and returned to the user in the requested language 30. If the page has not been pre-translated, then the page is retrieved 20 from the World Wide Web 22, translated into the requested language, and cached before being sent to the user 30.
Translation of Web pages, in either the bundled browser/MT configuration or the Web Server configuration, requires processing of HTML codes containing reference, command, and display information. Preferably, the HTML codes are identified prior to translation, then surrounded by special boundary markers to block the translation process on the codes. The HTMI. preprocessor uses its knowledge regarding the markups, codes, data characters and the structure of HTML documents to determine which codes should be blocked from the translation process. After translation is complete, a postprocessing program removes the
9 special boundary markers so that the necessary references, commands, and display characteristics are available in the translated text.
The primary objective of the present invention is to allow a user of the Internet to read , hypertext documents that are available only in a language foreign to the user.
The readable text of the hypertext document is changed in accordance with the user's preferred language.
Steps are taken to preserve the document's appearance and behavior so that the only noticeable difference between the original document and the translated document is the language of the text. Users may interact with the translated document and reference related documents in the same manner that users interact with the original document.

Claims (29)

THE EMBODIMENTS OF THE INVENTION IN WHICH AN EXCLUSIVE PROPERTY
OR PRIVILEGE IS CLAIMED ARE DEFINED AS FOLLOWS:
1. A method for translating a plurality of documents comprising codes and data characters in a first language, comprising the steps of:
specifying a second language;
transmitting a request for a first document from a browser to a web server;
transmitting said first document to a preprocessor, said preprocessor adapted to insert markers around said codes in said first document;
inserting markers around said codes in said first document;
transmitting said first document to a language translator, said language translator adapted to leave said marked codes untranslated;
translating said data characters in said first language to data characters in said second language using said language translator;
transmitting said first translated document to a postprocessor, said postprocessor adapted to remove said markers from said first translated document;
removing said markers from said first translated document;
displaying said first translated document at said browser;
selecting in said first translated document a link to a second document in said first language;
transmitting a request for said second document from a browser to a web server;
transmitting said second document to said preprocessor; inserting markers around codes in said second document;
transmitting said second document to said language translator;
translating said data characters in said first language to data characters in said second language using said language translator;
transmitting said second translated document to said postprocessor;
removing said markers from said second translated document;
and displaying said second translated document at said browser.
2. The method of claim 1, wherein said codes are HyperText Markup Language codes.
3. The method of claim 1, wherein said first and seconds documents are translated by said language translator and cached at said server prior to transmission of said requests for said first and second documents.
4. The method of claim 1, wherein said browser performs the steps of transmitting said first and second documents to a preprocessor, inserting markers around said codes in said first and second documents, transmitting said first and second documents to a language translator, translating said data characters in said first language to data characters in a second language, transmitting said first and second documents to a post processor, and removing said markers from said first and second translated documents.
5. The method of claim 1, wherein said server performs the steps of transmitting said first and second documents to a preprocessor, inserting markers around said codes in said first and second documents, transmitting said first and second documents to a language translator, translating said data characters in said first language to data characters in a second language, transmitting said first and second documents to a post processor, and removing said markers from said first and second translated documents.
6. A document translation system for translating documents comprising codes and data characters in a first language, said system comprising: a browser for requesting said documents from a web server in accordance with links in said documents and for specifying a second language;
a preprocessor for marking said codes in said documents;
a language translator for translating into said second language said data characters in said preprocessed documents; and a postprocessor for unmarking said codes in said translated documents.
7. The system of claim 6, wherein said codes are HyperText Markup Language codes.
8. The system of claim 6, wherein said preprocessor, said language translator, and said postprocessor are integrated into said browser.
9. The system of claim 6, wherein said preprocessor, said language translator, and said postprocessor are integrated into said server.
10. A method for translating documents, comprising the steps o~

defining in an original language a document containing display and reference codes and data characters exclusive of said display and reference codes;
storing said document at a server;
selecting a target language, said target language specified by a user of a browser;
requesting said document from said server, said document requested by said user in accordance with a link;
preprocessing said document to insert markers around said reference codes;
translating said data characters in said document from said original language to said target language;
postprocessing said document to remove said markers from around said reference codes;
and transmitting said document in said target language to a browser.
11. The method of claim 10 wherein said codes are HyperText Markup Language codes.
12. The method of claim 10 further comprising the step of storing said document in said target language in a cache at said server.
13. A system for automated translation of HTML documents comprising:
a browser for requesting and displaying HTML documents;
a server for processing browser requests for HTML documents;
a connection between said browser to said server for transmitting requests from said browser to said server and HTML documents from said server to said browser;
a request from said browser for a HTML document in a source language, said HTML
document requested in accordance with a link to said HTML document in said source language;
a target language specified in accordance with said browser; and machine translation software for translating said HTML document in said source language to a HTML
document in said target language.
14. The system of claim 13 wherein said machine translation software comprises a preprocessor for inserting markers around codes in said HTML document in said source language, a language translator for translating said HTML document in said source language to a document in said target language, and a postprocessor for removing markers from around codes in said document in said target language.
15. The system of claim 13 wherein said machine translation software is integrated with said browser.
16. The system of claim 13 wherein said machine translation software is operable at said server.
17. The system of claim 13 further comprising a cache at said server for storing said HTML document in said target language.
18. A method for providing HTML documents in a plurality of languages comprising the steps of:
retrieving from a server a HTML document in a source language, said HTML
document selected in accordance with a link;
determining a target language, said target language specified by a user of a browser;
translating said HTML document in said source language to a HTML document in said target language; and displaying said HTML document in said target language at said browser.
19. The method of claim 18 further comprising the step of storing said HTML
document in said target language in a cache at said server.
20. The method of claim 19 further comprising the steps of:
requesting from said server a HTML document in a source language;
determining a target language, said target language specified by a user of a browser requesting said HTML document in a source language;
locating said HTML document in said target language in said cache at said server; and displaying said HTML document in said target language at said browser.
21. The method of claim 18 wherein the step of translating said HTML document in a source language to a HTML document in said target language is performed by said browser.
22. The method of claim 18 wherein the step of translating said HTML document in a source language to a HTML document in said target language is performed by said server.
23. The method of claim 18 wherein the step of translating said HTML document in a source language to a HTML document in said target language comprises the steps of:
preprocessing said HTML document in said source language to insert markers around codes in said HTML document in said source language;
translating said HTML document in said source language to a document in said target language; and postprocessing said document in said target language to remove markers from around codes in said document in said target language.
24. A multilingual browser comprising:
a browser for requesting and displaying HTML documents, said browser adapted for specification of a target language; and machine translation software integrated into said browser, said machine translation software adapted to translate a HTML document in a source language to a HTML
document in said target language, in accordance with a user of said browser selecting a link to said HTML
document in said source document.
25. The multilingual browser of claim 24 wherein said machine translation software comprises a preprocessor for inserting markers around codes in said HTML
document in said source language, a language translator for translating said HTML document in said source language to a document in said target language, and a postprocessor for removing markers from around codes in said document in said target language.
26. A system for automated translation of HTML documents, comprising:
a plurality of web servers adapted to store a plurality of HTML documents in a source language;
a personal computer equipped with a browser for accessing HTML documents;
a server for processing requests from said browser at said personal computer for access to said plurality of HTML documents;
a target language specified by a user of said personal computer;
a request for one of said plurality of HTML documents in said source language;
a language translator for translating said requested HTML document in said source language to a HTML document in said target language; and a display at said personal computer for displaying said HTML document in said target language.
27. The system of claim 26 wherein said language translator is integrated with said browser.
28. The system of claim 26 wherein said language translator is operable at said server.
29. The system of claim 28 wherein said HTML document in said target language is stored in a cache at said server.
CA002216387A 1995-11-13 1996-11-13 Integrated multilingual browser Expired - Lifetime CA2216387C (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US08/555,916 1995-11-13
US08/555,916 US6993471B1 (en) 1995-11-13 1995-11-13 Integrated multilingual browser
PCT/US1996/018102 WO1997018516A1 (en) 1995-11-13 1996-11-13 Integrated multilingual browser

Publications (2)

Publication Number Publication Date
CA2216387A1 CA2216387A1 (en) 1997-05-22
CA2216387C true CA2216387C (en) 2003-07-15

Family

ID=24219113

Family Applications (1)

Application Number Title Priority Date Filing Date
CA002216387A Expired - Lifetime CA2216387C (en) 1995-11-13 1996-11-13 Integrated multilingual browser

Country Status (5)

Country Link
US (3) US6993471B1 (en)
EP (1) EP0829053A4 (en)
AU (1) AU1406197A (en)
CA (1) CA2216387C (en)
WO (1) WO1997018516A1 (en)

Families Citing this family (99)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6993471B1 (en) * 1995-11-13 2006-01-31 America Online, Inc. Integrated multilingual browser
US6470306B1 (en) 1996-04-23 2002-10-22 Logovista Corporation Automated translation of annotated text based on the determination of locations for inserting annotation tokens and linked ending, end-of-sentence or language tokens
US6996609B2 (en) * 1996-05-01 2006-02-07 G&H Nevada Tek Method and apparatus for accessing a wide area network
EP0810533B1 (en) * 1996-05-29 2002-04-10 Matsushita Electric Industrial Co., Ltd. Document conversion apparatus
JP2001503540A (en) * 1996-06-14 2001-03-13 ロゴヴィスタ株式会社 Automatic translation of annotated text
AU7753998A (en) 1997-05-28 1998-12-30 Shinar Linguistic Technologies Inc. Translation system
GB9716887D0 (en) * 1997-08-08 1997-10-15 British Telecomm Translation
GB9727322D0 (en) 1997-12-29 1998-02-25 Xerox Corp Multilingual information retrieval
US6526426B1 (en) * 1998-02-23 2003-02-25 David Lakritz Translation management system
US8489980B2 (en) * 1998-02-23 2013-07-16 Transperfect Global, Inc. Translation management system
US10541973B2 (en) * 1998-02-23 2020-01-21 Transperfect Global, Inc. Service of cached translated content in a requested language
US6623529B1 (en) 1998-02-23 2003-09-23 David Lakritz Multilingual electronic document translation, management, and delivery system
GB2337611A (en) 1998-05-20 1999-11-24 Sharp Kk Multilingual document retrieval system
DE19936314A1 (en) * 1998-08-05 2000-02-17 Spyglass Inc Conversion process for document data that is communicated over the Internet uses data base of conversion preferences
US6925595B1 (en) 1998-08-05 2005-08-02 Spyglass, Inc. Method and system for content conversion of hypertext data using data mining
US7191393B1 (en) 1998-09-25 2007-03-13 International Business Machines Corporation Interface for providing different-language versions of markup-language resources
JP3055545B1 (en) * 1999-01-19 2000-06-26 富士ゼロックス株式会社 Related sentence retrieval device
US6353855B1 (en) 1999-03-01 2002-03-05 America Online Providing a network communication status description based on user characteristics
US7607085B1 (en) * 1999-05-11 2009-10-20 Microsoft Corporation Client side localizations on the world wide web
AU6405900A (en) * 1999-06-21 2001-01-09 Cleverlearn.Com Language teaching and translation system and method
SE9903986L (en) * 1999-11-03 2001-05-04 Tony Norman Procedure for creating a presentation in multiple versions
AU3741200A (en) * 1999-12-20 2001-07-03 Netzero, Inc. Method and apparatus employing a proxy server for modifying an html document supplied by a web server to a web client
AU765001B2 (en) * 2000-02-02 2003-09-04 Transperfect Global, Inc. Translation ordering system
AUPQ539700A0 (en) * 2000-02-02 2000-02-24 Worldlingo.Com Pty Ltd Translation ordering system
US7216072B2 (en) * 2000-02-29 2007-05-08 Fujitsu Limited Relay device, server device, terminal device, and translation server system utilizing these devices
KR100450881B1 (en) * 2000-03-16 2004-10-01 주식회사 유니소프트 System and Method for multi language translation
JP2003529845A (en) * 2000-03-31 2003-10-07 アミカイ・インコーポレイテッド Method and apparatus for providing multilingual translation over a network
KR100367675B1 (en) * 2000-04-27 2003-01-15 엘지전자 주식회사 Tv text information translation system and control method the same
US7437669B1 (en) * 2000-05-23 2008-10-14 International Business Machines Corporation Method and system for dynamic creation of mixed language hypertext markup language content through machine translation
FR2809509B1 (en) 2000-05-26 2003-09-12 Bull Sa SYSTEM AND METHOD FOR INTERNATIONALIZING THE CONTENT OF TAGGED DOCUMENTS IN A COMPUTER SYSTEM
WO2002001387A2 (en) * 2000-06-23 2002-01-03 Medtronic, Inc. Human language translation of patient session information from implantable medical devices
JP4011268B2 (en) * 2000-07-05 2007-11-21 株式会社アイアイエス Multilingual translation system
KR100387918B1 (en) * 2000-07-11 2003-06-18 이수성 Interpreter
US6993568B1 (en) 2000-11-01 2006-01-31 Microsoft Corporation System and method for providing language localization for server-based applications with scripts
AUPR329501A0 (en) * 2001-02-22 2001-03-22 Worldlingo, Inc Translation information segment
JP3379090B2 (en) * 2001-03-02 2003-02-17 インターナショナル・ビジネス・マシーンズ・コーポレーション Machine translation system, machine translation method, and machine translation program
US6999916B2 (en) * 2001-04-20 2006-02-14 Wordsniffer, Inc. Method and apparatus for integrated, user-directed web site text translation
US8214196B2 (en) 2001-07-03 2012-07-03 University Of Southern California Syntax-based statistical translation model
JP3809863B2 (en) * 2002-02-28 2006-08-16 インターナショナル・ビジネス・マシーンズ・コーポレーション server
AU2003269808A1 (en) 2002-03-26 2004-01-06 University Of Southern California Constructing a translation lexicon from comparable, non-parallel corpora
JP2003296223A (en) * 2002-03-29 2003-10-17 Fuji Xerox Co Ltd Method and device, and program for providing web page information
US7627479B2 (en) 2003-02-21 2009-12-01 Motionpoint Corporation Automation tool for web site content language translation
JP2004280352A (en) * 2003-03-14 2004-10-07 Ricoh Co Ltd Method and program for translating document data
US8230112B2 (en) * 2003-03-27 2012-07-24 Siebel Systems, Inc. Dynamic support of multiple message formats
US8548794B2 (en) 2003-07-02 2013-10-01 University Of Southern California Statistical noun phrase translation
US20050010419A1 (en) * 2003-07-07 2005-01-13 Ahmad Pourhamid System and Method for On-line Translation of documents and Advertisement
US7321852B2 (en) * 2003-10-28 2008-01-22 International Business Machines Corporation System and method for transcribing audio files of various languages
US8296127B2 (en) 2004-03-23 2012-10-23 University Of Southern California Discovery of parallel text portions in comparable collections of corpora and training using comparable texts
US8666725B2 (en) 2004-04-16 2014-03-04 University Of Southern California Selection and use of nonstatistical translation components in a statistical machine translation framework
WO2006042321A2 (en) 2004-10-12 2006-04-20 University Of Southern California Training for a text-to-text application which uses string to tree conversion for training and decoding
US8676563B2 (en) 2009-10-01 2014-03-18 Language Weaver, Inc. Providing human-generated and machine-generated trusted translations
US8886517B2 (en) 2005-06-17 2014-11-11 Language Weaver, Inc. Trust scoring for language translation systems
US10319252B2 (en) 2005-11-09 2019-06-11 Sdl Inc. Language capability assessment and training apparatus and techniques
JP2007207328A (en) * 2006-01-31 2007-08-16 Toshiba Corp Information storage medium, program, information reproducing method, information reproducing device, data transfer method, and data processing method
US8943080B2 (en) 2006-04-07 2015-01-27 University Of Southern California Systems and methods for identifying parallel documents and sentence fragments in multilingual document collections
US8886518B1 (en) 2006-08-07 2014-11-11 Language Weaver, Inc. System and method for capitalizing machine translated text
US8433556B2 (en) 2006-11-02 2013-04-30 University Of Southern California Semi-supervised training for statistical word alignment
US9122674B1 (en) * 2006-12-15 2015-09-01 Language Weaver, Inc. Use of annotations in statistical machine translation
US8468149B1 (en) 2007-01-26 2013-06-18 Language Weaver, Inc. Multi-lingual online community
US8615389B1 (en) 2007-03-16 2013-12-24 Language Weaver, Inc. Generation and exploitation of an approximate language model
US8831928B2 (en) 2007-04-04 2014-09-09 Language Weaver, Inc. Customizable machine translation service
US9361294B2 (en) 2007-05-31 2016-06-07 Red Hat, Inc. Publishing tool for translating documents
US10296588B2 (en) * 2007-05-31 2019-05-21 Red Hat, Inc. Build of material production system
US8825466B1 (en) 2007-06-08 2014-09-02 Language Weaver, Inc. Modification of annotated bilingual segment pairs in syntax-based machine translation
US20090007128A1 (en) * 2007-06-28 2009-01-01 International Business Machines Corporation method and system for orchestrating system resources with energy consumption monitoring
JP5656353B2 (en) * 2007-11-07 2015-01-21 インターナショナル・ビジネス・マシーンズ・コーポレーションInternational Business Machines Corporation Method and apparatus for controlling access of multilingual text resources
US7974832B2 (en) * 2007-12-12 2011-07-05 Microsoft Corporation Web translation provider
US20090162818A1 (en) * 2007-12-21 2009-06-25 Martin Kosakowski Method for the determination of supplementary content in an electronic device
US9201870B2 (en) * 2008-01-25 2015-12-01 First Data Corporation Method and system for providing translated dynamic web page content
US9110890B2 (en) * 2008-02-15 2015-08-18 International Business Machines Corporation Selecting a language encoding of a static communication in a virtual universe
US7698688B2 (en) * 2008-03-28 2010-04-13 International Business Machines Corporation Method for automating an internationalization test in a multilingual web application
CA2755427C (en) * 2009-03-18 2017-03-14 Google Inc. Web translation with display replacement
US8990064B2 (en) 2009-07-28 2015-03-24 Language Weaver, Inc. Translating documents based on content
US8380486B2 (en) 2009-10-01 2013-02-19 Language Weaver, Inc. Providing machine-generated translations and corresponding trust levels
US10417646B2 (en) 2010-03-09 2019-09-17 Sdl Inc. Predicting the cost associated with translating textual content
EP2680159B1 (en) 2010-07-13 2020-01-15 Motionpoint Corporation Dynamic language translation of a message
CN102467497B (en) * 2010-10-29 2014-11-05 国际商业机器公司 Method and system for text translation in verification program
US9164988B2 (en) * 2011-01-14 2015-10-20 Lionbridge Technologies, Inc. Methods and systems for the dynamic creation of a translated website
US9063931B2 (en) * 2011-02-16 2015-06-23 Ming-Yuan Wu Multiple language translation system
US11003838B2 (en) 2011-04-18 2021-05-11 Sdl Inc. Systems and methods for monitoring post translation editing
US8694303B2 (en) 2011-06-15 2014-04-08 Language Weaver, Inc. Systems and methods for tuning parameters in statistical machine translation
CN102855107B (en) 2011-06-30 2015-05-27 国际商业机器公司 Method and system for demonstrating file on computer
US8812295B1 (en) * 2011-07-26 2014-08-19 Google Inc. Techniques for performing language detection and translation for multi-language content feeds
US8886515B2 (en) 2011-10-19 2014-11-11 Language Weaver, Inc. Systems and methods for enhancing machine translation post edit review processes
US8942973B2 (en) 2012-03-09 2015-01-27 Language Weaver, Inc. Content page URL translation
US10261994B2 (en) 2012-05-25 2019-04-16 Sdl Inc. Method and system for automatic management of reputation of translators
US9116886B2 (en) * 2012-07-23 2015-08-25 Google Inc. Document translation including pre-defined term translator and translation model
US9152622B2 (en) 2012-11-26 2015-10-06 Language Weaver, Inc. Personalized machine translation via online adaptation
US9213694B2 (en) 2013-10-10 2015-12-15 Language Weaver, Inc. Efficient online domain adaptation
US20150254236A1 (en) * 2014-03-13 2015-09-10 Michael Lewis Moravitz Translation software built into internet
US9690780B2 (en) 2014-05-23 2017-06-27 International Business Machines Corporation Document translation based on predictive use
US10713699B1 (en) * 2014-11-14 2020-07-14 Andersen Corporation Generation of guide materials
CN105930320A (en) * 2016-04-15 2016-09-07 惠州Tcl移动通信有限公司 Word crossing and searching method and system based on mobile terminals
KR102056999B1 (en) * 2018-02-26 2019-12-17 러브랜드 가부시키가이샤 Web page translation system, web page translation device, web page providing device and web page translation method
US10803257B2 (en) * 2018-03-22 2020-10-13 Microsoft Technology Licensing, Llc Machine translation locking using sequence-based lock/unlock classification
US10540452B1 (en) * 2018-06-21 2020-01-21 Amazon Technologies, Inc. Automated translation of applications
US10922496B2 (en) 2018-11-07 2021-02-16 International Business Machines Corporation Modified graphical user interface-based language learning
US11373048B2 (en) * 2019-09-11 2022-06-28 International Business Machines Corporation Translation of multi-format embedded files
US11385916B2 (en) * 2020-03-16 2022-07-12 Servicenow, Inc. Dynamic translation of graphical user interfaces

Family Cites Families (37)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4774655A (en) 1984-10-24 1988-09-27 Telebase Systems, Inc. System for retrieving information from a plurality of remote databases having at least two different languages
JPS61105671A (en) 1984-10-29 1986-05-23 Hitachi Ltd Natural language processing device
JP2654001B2 (en) * 1986-05-08 1997-09-17 株式会社東芝 Machine translation method
GB2198565A (en) * 1986-11-28 1988-06-15 Sharp Kk Translation apparatus
GB2199170A (en) 1986-11-28 1988-06-29 Sharp Kk Translation apparatus
US4870610A (en) 1987-08-25 1989-09-26 Bell Communications Research, Inc. Method of operating a computer system to provide customed I/O information including language translation
US5005127A (en) * 1987-10-26 1991-04-02 Sharp Kabushiki Kaisha System including means to translate only selected portions of an input sentence and means to translate selected portions according to distinct rules
JP2831647B2 (en) * 1988-03-31 1998-12-02 株式会社東芝 Machine translation system
US5140521A (en) 1989-04-26 1992-08-18 International Business Machines Corporation Method for deleting a marked portion of a structured document
US5289375A (en) 1990-01-22 1994-02-22 Sharp Kabushiki Kaisha Translation machine
JPH03268062A (en) * 1990-03-19 1991-11-28 Fujitsu Ltd Register for private use word in machine translation electronic mail device
JP3114181B2 (en) 1990-03-27 2000-12-04 株式会社日立製作所 Interlingual communication translation method and system
US5175684A (en) 1990-12-31 1992-12-29 Trans-Link International Corp. Automatic text translation and routing system
US5497319A (en) 1990-12-31 1996-03-05 Trans-Link International Corp. Machine translation and telecommunications system
JP2815714B2 (en) 1991-01-11 1998-10-27 シャープ株式会社 Translation equipment
JP2765665B2 (en) 1991-08-01 1998-06-18 富士通株式会社 Translation device for documents with typographical information
JP2848729B2 (en) * 1991-12-06 1999-01-20 株式会社東芝 Translation method and translation device
US5243519A (en) 1992-02-18 1993-09-07 International Business Machines Corporation Method and system for language translation within an interactive software application
JP3038079B2 (en) 1992-04-28 2000-05-08 シャープ株式会社 Automatic translation device
JP3220560B2 (en) 1992-05-26 2001-10-22 シャープ株式会社 Machine translation equipment
US5373442A (en) 1992-05-29 1994-12-13 Sharp Kabushiki Kaisha Electronic translating apparatus having pre-editing learning capability
US5608622A (en) * 1992-09-11 1997-03-04 Lucent Technologies Inc. System for analyzing translations
JPH07210558A (en) 1994-01-20 1995-08-11 Fujitsu Ltd Machine translation device
US5822720A (en) * 1994-02-16 1998-10-13 Sentius Corporation System amd method for linking streams of multimedia data for reference material for display
US5740231A (en) 1994-09-16 1998-04-14 Octel Communications Corporation Network-based multimedia communications and directory system and method of operation
US5678039A (en) * 1994-09-30 1997-10-14 Borland International, Inc. System and methods for translating software into localized versions
US5675817A (en) * 1994-12-05 1997-10-07 Motorola, Inc. Language translating pager and method therefor
US5855015A (en) * 1995-03-20 1998-12-29 Interval Research Corporation System and method for retrieval of hyperlinked information resources
US5963205A (en) * 1995-05-26 1999-10-05 Iconovex Corporation Automatic index creation for a word processor
US5752246A (en) * 1995-06-07 1998-05-12 International Business Machines Corporation Service agent for fulfilling requests of a web browser
US5710918A (en) * 1995-06-07 1998-01-20 International Business Machines Corporation Method for distributed task fulfillment of web browser requests
US5721908A (en) * 1995-06-07 1998-02-24 International Business Machines Corporation Computer network for WWW server data access over internet
US5745360A (en) * 1995-08-14 1998-04-28 International Business Machines Corp. Dynamic hypertext link converter system and process
US5781785A (en) * 1995-09-26 1998-07-14 Adobe Systems Inc Method and apparatus for providing an optimized document file of multiple pages
US6993471B1 (en) * 1995-11-13 2006-01-31 America Online, Inc. Integrated multilingual browser
US5870610A (en) * 1996-06-28 1999-02-09 Siemens Business Communication Systems, Inc. Autoconfigurable method and system having automated downloading
US6493735B1 (en) * 1998-12-15 2002-12-10 International Business Machines Corporation Method system and computer program product for storing bi-directional language data in a text string object for display on non-bidirectional operating systems

Also Published As

Publication number Publication date
CA2216387A1 (en) 1997-05-22
WO1997018516A1 (en) 1997-05-22
US7716038B2 (en) 2010-05-11
US20050149315A1 (en) 2005-07-07
EP0829053A1 (en) 1998-03-18
US20080059148A1 (en) 2008-03-06
AU1406197A (en) 1997-06-05
US6993471B1 (en) 2006-01-31
US7292987B2 (en) 2007-11-06
EP0829053A4 (en) 1998-12-23

Similar Documents

Publication Publication Date Title
CA2216387C (en) Integrated multilingual browser
US6330529B1 (en) Mark up language grammar based translation system
US20020065658A1 (en) Universal translator/mediator server for improved access by users with special needs
RU2295150C2 (en) Segment of translation data
US6119078A (en) Systems, methods and computer program products for automatically translating web pages
US6405192B1 (en) Navigation assistant-method and apparatus for providing user configured complementary information for data browsing in a viewer context
KR100317401B1 (en) Apparatus and method for printing related web pages
US6073143A (en) Document conversion system including data monitoring means that adds tag information to hyperlink information and translates a document when such tag information is included in a document retrieval request
US6925595B1 (en) Method and system for content conversion of hypertext data using data mining
US5903727A (en) Processing HTML to embed sound in a web page
US6961737B2 (en) Serving signals
US6308198B1 (en) Method and apparatus for dynamically adding functionality to a set of instructions for processing a web document based on information contained in the web document
EP0834853A2 (en) Method and apparatus for presenting client side image maps
US20010014895A1 (en) Method and apparatus for dynamic software customization
US7756849B2 (en) Method of searching for text in browser frames
JP4990302B2 (en) Data processing method, data processing program, and data processing apparatus
US20020188435A1 (en) Interface for submitting richly-formatted documents for remote processing
KR20040101468A (en) Method, system, computer program product and storage device for displaying a document
US6035338A (en) Document browse support system and document processing system
Iaccarino et al. Personalizable edge services for web accessibility
US6636235B1 (en) Lettering adjustments for display resolution
WO2002080133A1 (en) Non visual presentation of salient features in a document
US8806326B1 (en) User preference based content linking
KR20020042026A (en) Pre-processor and method and apparatus for processing web documents using the same
KR20010103545A (en) Storage medium, system and apparatus for Internet translation with advertisement

Legal Events

Date Code Title Description
EEER Examination request
MKEX Expiry

Effective date: 20161114