WO2012115507A1 - Website translator, system and method - Google Patents

Website translator, system and method Download PDF

Info

Publication number
WO2012115507A1
WO2012115507A1 PCT/NL2012/000016 NL2012000016W WO2012115507A1 WO 2012115507 A1 WO2012115507 A1 WO 2012115507A1 NL 2012000016 W NL2012000016 W NL 2012000016W WO 2012115507 A1 WO2012115507 A1 WO 2012115507A1
Authority
WO
WIPO (PCT)
Prior art keywords
information
website
translated
translation
translator
Prior art date
Application number
PCT/NL2012/000016
Other languages
French (fr)
Inventor
Danny De Wit
Original Assignee
Exvo.Com Group B.V.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Exvo.Com Group B.V. filed Critical Exvo.Com Group B.V.
Publication of WO2012115507A1 publication Critical patent/WO2012115507A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation

Definitions

  • the present invention is related a website translator, system, and method for providing a translated version of a webpage of a website in response to a HyperText Transfer
  • HTTP HyperText Transfer Protocol
  • Translators are known that are incorporated in the web browser. A user wanting to have a specific webpage of a website translated first has to visit the webpage, which then is still in an undesired language. The user may then operate a translate button or other function to instruct the browser to translate any text content in the currently displayed webpage. Similarly, a script may be run which consults a remote computer translator to provide
  • the present invention provides an improved website translator.
  • the website translator comprises a first receiving unit arranged for receiving the HTTP request, and an extracting unit arranged for extracting a language identifier from the HTTP request and/or from an Internet Protocol (IP) address of the client computer, the language identifier corresponding to a
  • IP Internet Protocol
  • the website translator further comprises a
  • forwarding unit for forwarding the HTTP request to the first server, and a second receiving unit arranged for receiving HTML information from the first server in response to the HTTP request.
  • a modifying unit is then used for modifying the received HTML information by replacing information to be translated in the received HTML information with a
  • the HTTP request is not directed to the first server on which the website is hosted. Instead, the HTTP request is directed to the website translator which is arranged in between the user and the first server.
  • a HTTP request comprises a host name of the server to which the HMTL request is directed as well as specifics of the web browser and/or computer system of the client
  • IP address is a form of location information enabling the geographical location of the user to be determined.
  • both the IP address and the information regarding the language that is used on the client computer are examples of a language identifier as they both allow a language used and/or desired by the user of the client computer to be determined.
  • the website translator forwards the HTTP request to the first server which hosts the website to be translated.
  • the website translator acts as a client
  • the HTML information that is received in return e.g. a HTML page
  • relevant information is extracted from the HTML information, e.g. text portions between HTML tags, and is subsequently
  • the user need not have any knowledge about the location of the first server. Moreover, as the translation is done real-time there is no need to store a copy of the translated website or webpage. Because the language in which the webpage is presented to the user always corresponds to the language settings already used and/or desired, the user is not first confronted with a webpage in a language he cannot understand.
  • the present invention several sources of information can be used to determine the language used and/or desired by the user. It is possible to offer the user an option if these sources provide multiple languages. In such case, a script can be run on the website translator establishing communication between the user and the website translator prompting the user to select between several options.
  • the present invention may even be modified such that the extracting unit no longer extracts the desired information from the HTTP request but that it will always determine this information using input from the user as described above .
  • the website translator further comprises a server database having stored therein a correlation between a host name extracted from the HTTP request and a host name of the first server.
  • the extracting unit is then arranged to extract a host name from the HTTP request, whereas the forwarding unit is arranged to forward the HTTP request to the first server having a host name that correlates with the host name extracted from the HTTP request.
  • the first server could have the host name "original.company.com”
  • the host name of the website translator is "company.com”.
  • the server database comprises the correlation between
  • the HTTP request comprises the host name "company.com”.
  • the forwarding unit is able to send the HTTP request from the website translator to the first server having the host name "original.company.com" on which the website to be translated is hosted.
  • host name should be interpreted as an Internet host name that represents a domain name assigned to a host computer.
  • the website translator has an Internet Protocol (IP) address linked to a plurality of host names
  • the server database comprises a correlation for each of the host names with a host name of a respective first server.
  • IP Internet Protocol
  • the use of multiple host names for a single IP address allows the website translator to be operative for multiple websites at the same time. For instance, the website
  • translator could be configured to provide translations of a webpage "page 1" hosted on a first server having host name "original . comanyl” corresponding to "company 1" and of a webpage "page 2" hosted on a first server having host name "original . company2" corresponding to "company 2".
  • two host names could be attributed to the IP address of the website translator, e.g. "companyl.com” and
  • the server database then comprises a
  • the modifying unit is arranged for extracting text content in the HTTP request.
  • the information to be translated could comprise a word, a phrase or part thereof, or a paragraph of the extracted text content.
  • text content is embedded in between HTML tags. By scanning a HTML document, the relevant text content can be filtered out for instance based on the HTML tags.
  • the modifying unit is arranged for extracting a link to media content in the HTML information and to replace the link with a link to a translated version of the media content.
  • the translated version of the media content can be stored on a server different from the first server. It may reside on a dedicated server within the website translator or group of website translator. Such server may even comprise the server database and the content server database, which is discussed later.
  • the website translator further comprises a computer translator for providing the translated version as a computer translation of the information to be translated.
  • Computer translators are known. Within the context of the present invention, a computer translator is construed as an automated translation apparatus which provides a
  • the website translator further comprises a content database having stored therein a correlation between information to be translated and a translation of this information.
  • the content database could comprise the correlation between "Hello” (for English) and "Bonjour” (for French) and "Hallo” (for Dutch) .
  • the content base need not be restricted to a correlation between two
  • the content database may include correlations between entire phrases, e.g. between "How are you” (for English) , “Comment loom-vous” (for French) , and "Hoe gaat het met u” (for Dutch) .
  • modifying unit is arranged to replace the information to be translated in the HTML information with the translation of this information if this translation is available in the content database.
  • the HTML information e.g. a HTML page
  • the HTML information is scanned for text content.
  • translation for the text content is taken from the content database. It is noted that multiple translations are needed as a single HTML page normally contains several pieces of text content .
  • the modifying unit is arranged to control the computer translator to provide a computer translation of the information to be translated and to replace the
  • the website translator comprises a content database management unit arranged for storing a computer translation of information to be translated in the content database. Hence, if a translation is not available in the content database, the generated computer translation can be stored in the content database for future use .
  • the content database management unit is arranged for storing an
  • translations can be translations that are not generated by a computer but by a human translator. It is even possible that the content database comprises both the computer translation and the human translation. Attributes can be used to
  • the content database comprises a plurality of entries, each entry comprising a correlation between a hash of the information to be translated and at least one translation of this information.
  • the content database management unit is preferably arranged to hash the information to be translated by applying a predefined hash function to this information and to correlate the hashed information with the at least one translation of this information.
  • a hash is generated, for instance in the form of a hexadecimal number which is unique for that text segment.
  • hash functions By using hash functions, the content database can easily be searched and constructed.
  • the content database management unit is further arranged to assign an attribute to the translation of the information to be translated.
  • the attribute allows different translations for the same information to be translated, e.g. a text segment, to be distinguished.
  • the attributes may contain information regarding the translation, such as status, origin, language, and quality.
  • the content database preferably comprises a correlation between information to be translated and at least two translations thereof, wherein the modifying unit is arranged to select one of said at least two translations based on the attributes assigned to the at least two translations. For instance, the modifying unit may select a particular
  • the attribute can therefore be a country or language code
  • the modifying unit can be arranged to select one of said at least two translations based on the extracted language identifier, such as a language code and/or IP address.
  • modifying unit is arranged for modifying the received HTML information by replacing Cascaded Style Sheet (CSS)
  • a webpage contains layout information.
  • This information can be incorporated in as in-line CSS statements or the HTML code may comprise a link to a CSS file stored for instance in the first server. According to the present invention, this layout information can be replaced with layout information specific for the desired and/or used language.
  • the present invention also provides a website
  • At least one of the content database, the content database management unit, the computer translator, and the server database is preferably arranged as a central unit common to each of the plurality of website translators. This allows multiple website
  • translators to use a common resource. This is particularly advantageous for the content database . As the network of website translators grows, so does the content in the content database. This allows the modifying unit to select a translation from the content database in more cases and for more languages instead of having to resort to a computer translation. Moreover, selecting a translation from the content database will in most cases prove to be a faster solution than providing a computer translation.
  • the present invention also provides a method for providing a translated version of a webpage of a website in response to a HyperText Transfer Protocol (HTTP) request from a client computer, wherein the website is hosted on a first server having a first host name.
  • HTTP HyperText Transfer Protocol
  • IP Internet Protocol
  • modifying the received HTML information by replacing information to be translated in the received HTML information with a translated version thereof in correspondence with the used and/or desired language; f . sending the modified HTML information to the client computer for display as the translated version of the webpage of the website.
  • the method could further comprise providing a content database having stored therein a correlation between
  • the replacing of information to be translated then comprises :
  • Figure 1 illustrates an embodiment of a website
  • Figure 2 shows a network comprising a plurality of website translators according to the present invention.
  • Figure 1 shows an embodiment of a website translator according to the present invention.
  • a user may operate a client computer 1 to send a HTTP request to website
  • Website translator 2 receives the HTTP request using a first receiving unit 3.
  • the HTTP (HTTP/1.1) request typically comprises a request line, such as "GET /home/index. html HTTP/1.1" and a host header such as "host: www.example.com”.
  • This HTTP request would request the webpage www.example.com/home/index.html from a server having the host name www.example.com.
  • Website translator 2 is therefore able to extract the host name from the HTTP request.
  • website translator 2 has an Internet Protocol (IP) address acting as the physical address the HTTP request is sent to. It should be noted that multiple host names can be attributed to a single IP address.
  • IP Internet Protocol
  • a language identifier is extracted by extracting unit 4 from the HTTP request which corresponds to a language used and/or desired by the user.
  • An example is the "accept- language" header field in the HTTP request.
  • the IP address of client computer 1 may be used as this address is related to the geographical location of client computer 1, thereby giving another indication of the
  • Website translator 2 comprises a server database 5 having stored therein a relation between the host name extracted from the HTTP request and the host name of the server which hosts the website requested by the user. For instance, such relation could be "example.com" and
  • Forwarding unit 6 uses server database 5 to forward the received HTTP request to first server 7 based on the host name that is correlated with the host name extracted from the received HTTP request.
  • first server 7 will send HTML information, e.g. in the form of a HTML page, to second receiving unit 8 of website translator 2.
  • a HTML page comprises a plurality of HTML elements. These elements comprise a pair of tags, a start tag and an end tag, as well as some element attributes within the start tag and textual or graphical content between the start and end tags .
  • Website translator 2 comprises a modifying unit 9 for modifying the received HTML information. It does so by scanning the received HTML information, e.g. the HTML page, looking for tags and to extract the content in between the various start and end tags. If text content is found, modifying unit will consult content database 10 to look for a translation of the extracted text content into the desired and/or used language of the user. Modifying unit 9 will then replace the original text content in the HTML information with the translated version thereof. Consequently, any text content in the requested webpage is translated. Possible forms of text content are single words, phrases or parts thereof, paragraphs, or even entire pages. It should however be noted that it is advantageous to use moderately large units, e.g. phrases or parts thereof, to facilitate the reuse of the translation. On the one hand, parts should be large enough to enable a context to be established between neighboring words, whereas the parts should not be too large as this would severely limit the chance that another webpage comprises an identical piece of text.
  • a webpage may comprise a link to a media file, such as a picture.
  • This picture may contain text information as well.
  • Modifying unit 9 can therefore be arranged to extract the links to these media from the HTML information, and to replace the links with other links that point to corresponding media files, albeit in a different language. These translated versions of these media need not be located on first server 7.
  • a sending unit 11 sends the modified HTML
  • Website translator 2 may comprise a computer translator 12 for providing the translations of the text content.
  • Website translator 2 may be arranged to retrieve the
  • the modifying unit 11 first inspects content database 10 to look for a translation and to use such translation if it is present. However, in case such
  • computer translator 12 may provide one .
  • Website translator 2 may comprise a content database management unit 13 to manage the addition and deletion of translations into content database 10 as well as the access to it. To that end, it may receive translations for text content from one or more human translators 14 and/or from computer translator 12.
  • content database management unit 13 applies a hashing function to the information to be translated. For instance, the word “hello” becomes the hash 0xl245fd4e5, being a hexadecimal number. In content database 10, this hash is linked to a translation.
  • modifying unit 11 needs a translation of a piece of text, it will consult content database 10 via content database management unit 13, wherein the content database management unit 13 will first hash the information to be translated and subsequently look for a corresponding translation in content database 10. If such translation is found, that translation is returned to modifying unit 11. In case such translation is not found, computer translator 12 is consulted to provide one. This translation may subsequently be linked by the content database management unit 13 to the hash of the information to be translated to enable future use.
  • Spanish translations in Spanish Of these three translations in Spanish, two are generated by a person, whereas one is generated by computer translator 12. To distinguish between these translations, attributes may be assigned to them. For instance, the Spanish translation may be contain the
  • Modifying unit 11 can be arranged to select one of the available translations based on the attributes assigned to them. For instance, if the IP address of client computer 1 reveals that the user is located in chili, modifying unit 11 will use the South
  • the invention is particularly well suited to handle dynamic content, i.e. a webpage for which the text content changes often.
  • dynamic content i.e. a webpage for which the text content changes often.
  • a translation is available in content database 10.
  • the system can be configured to provide a computer translation.
  • a request to a human translator may be issued.
  • the translation provided by translator can be used in addition or instead of the computer translation. In this way, the translated version of the webpage is always
  • the present invention can easily be scaled, wherein a plurality of website translators 2 are arranged to each provide a translation of at least one website hosted on different first servers. Such network is illustrated in figure 2. Each of the two depicted website translators 2 is arranged to provide a translation of two different websites that are hosted on two different first servers. Again, the information in server database is used to correctly
  • content database, content database management unit and server database can be arranged as common central units accessible for each website translator.

Abstract

The present invention is related a website translator, system, and method for providing a translated version of a webpage of a website in response to a HyperText Transfer Protocol (HTTP) request from a client computer, wherein the website is hosted on a first server having a first host name. According to the present invention, the website translator is arranged in between the client computer and the actual server that hosts the website. The website translator then replaces text content in the HTML information that is received from the actual server and sends the modified HTML information to the client computer as the translated version of the webpage.

Description

WEBSITE TRANSLATOR, SYSTEM AND METHOD
The present invention is related a website translator, system, and method for providing a translated version of a webpage of a website in response to a HyperText Transfer
Protocol (HTTP) request from a client computer, wherein the website is hosted on a first server having a first host name .
Translators are known that are incorporated in the web browser. A user wanting to have a specific webpage of a website translated first has to visit the webpage, which then is still in an undesired language. The user may then operate a translate button or other function to instruct the browser to translate any text content in the currently displayed webpage. Similarly, a script may be run which consults a remote computer translator to provide
translations of the various text strings in the webpage.
The present invention provides an improved website translator. According to the present invention, the website translator comprises a first receiving unit arranged for receiving the HTTP request, and an extracting unit arranged for extracting a language identifier from the HTTP request and/or from an Internet Protocol (IP) address of the client computer, the language identifier corresponding to a
language used and/or desired by a user of the client
computer. The website translator further comprises a
forwarding unit for forwarding the HTTP request to the first server, and a second receiving unit arranged for receiving HTML information from the first server in response to the HTTP request. A modifying unit is then used for modifying the received HTML information by replacing information to be translated in the received HTML information with a
translated version thereof in correspondence with the used and/or desired language. Finally, the modified HTML
information is sent to the client computer for display as the translated version of the website using a sending unit.
According to the present invention, the HTTP request is not directed to the first server on which the website is hosted. Instead, the HTTP request is directed to the website translator which is arranged in between the user and the first server.
A HTTP request comprises a host name of the server to which the HMTL request is directed as well as specifics of the web browser and/or computer system of the client
computer from which the HTTP request originates . Examples are the IP address of the client computer, the web browser that is used, and the language that is used on the client computer. Part of this information can for instance be extracted from the browser settings and/or operating system settings, which are available in the HTTP request. The IP address is a form of location information enabling the geographical location of the user to be determined. Within the context of the present invention, both the IP address and the information regarding the language that is used on the client computer are examples of a language identifier as they both allow a language used and/or desired by the user of the client computer to be determined.
The website translator forwards the HTTP request to the first server which hosts the website to be translated. In this case, the website translator acts as a client
requesting a webpage from the first server. However, the HTML information that is received in return, e.g. a HTML page, is not relayed directly to the user. Instead, relevant information is extracted from the HTML information, e.g. text portions between HTML tags, and is subsequently
replaced by translated versions thereof. This is advantageous because the layout of the website can remain the same. Finally, the modified HTML information is sent to the user.
According to the present invention, the user need not have any knowledge about the location of the first server. Moreover, as the translation is done real-time there is no need to store a copy of the translated website or webpage. Because the language in which the webpage is presented to the user always corresponds to the language settings already used and/or desired, the user is not first confronted with a webpage in a language he cannot understand.
According to the present invention, several sources of information can be used to determine the language used and/or desired by the user. It is possible to offer the user an option if these sources provide multiple languages. In such case, a script can be run on the website translator establishing communication between the user and the website translator prompting the user to select between several options. The present invention may even be modified such that the extracting unit no longer extracts the desired information from the HTTP request but that it will always determine this information using input from the user as described above .
In an embodiment of the present invention, the website translator further comprises a server database having stored therein a correlation between a host name extracted from the HTTP request and a host name of the first server. The extracting unit is then arranged to extract a host name from the HTTP request, whereas the forwarding unit is arranged to forward the HTTP request to the first server having a host name that correlates with the host name extracted from the HTTP request. For instance, the first server could have the host name "original.company.com", whereas the host name of the website translator is "company.com". In this case, the server database comprises the correlation between
"original.company.com" and "company.com". The HTTP request comprises the host name "company.com". By using the
information in the server database, the forwarding unit is able to send the HTTP request from the website translator to the first server having the host name "original.company.com" on which the website to be translated is hosted.
It should be noted that within the present invention, host name should be interpreted as an Internet host name that represents a domain name assigned to a host computer.
In a further embodiment of the present invention, the website translator has an Internet Protocol (IP) address linked to a plurality of host names, and the server database comprises a correlation for each of the host names with a host name of a respective first server.
The use of multiple host names for a single IP address allows the website translator to be operative for multiple websites at the same time. For instance, the website
translator could be configured to provide translations of a webpage "page 1" hosted on a first server having host name "original . comanyl" corresponding to "company 1" and of a webpage "page 2" hosted on a first server having host name "original . company2" corresponding to "company 2". In this case, two host names could be attributed to the IP address of the website translator, e.g. "companyl.com" and
"company2.com". The server database then comprises a
correlation between "companyl.com" and "original . comanyl" and between "company2.com" and "original . comany2" . A HMTL request for "page 1" from the client computer is then forwarded to the first server "original . comanyl" where this webpage is generated or stored. In an embodiment of the present invention, the
modifying unit is arranged for extracting text content in the HTTP request. For instance, the information to be translated could comprise a word, a phrase or part thereof, or a paragraph of the extracted text content. Typically, in a webpage text content is embedded in between HTML tags. By scanning a HTML document, the relevant text content can be filtered out for instance based on the HTML tags.
In an embodiment of the present invention, the
modifying unit is arranged for extracting a link to media content in the HTML information and to replace the link with a link to a translated version of the media content. The translated version of the media content can be stored on a server different from the first server. It may reside on a dedicated server within the website translator or group of website translator. Such server may even comprise the server database and the content server database, which is discussed later.
In an embodiment of the present invention, the website translator further comprises a computer translator for providing the translated version as a computer translation of the information to be translated.
Computer translators are known. Within the context of the present invention, a computer translator is construed as an automated translation apparatus which provides a
translation of inputted text into a desired language.
According to the present invention, this language is
determined from the HTLM request and/or the IP address of the client computer.
In an embodiment of the present invention, the website translator further comprises a content database having stored therein a correlation between information to be translated and a translation of this information. For instance, the content database could comprise the correlation between "Hello" (for English) and "Bonjour" (for French) and "Hallo" (for Dutch) . As such, the content base need not be restricted to a correlation between two
languages. Moreover, the content database may include correlations between entire phrases, e.g. between "How are you" (for English) , "Comment allez-vous" (for French) , and "Hoe gaat het met u" (for Dutch) .
In an embodiment of the present invention, the
modifying unit is arranged to replace the information to be translated in the HTML information with the translation of this information if this translation is available in the content database. Hence, upon receiving the HTML information from the first server, the HTML information, e.g. a HTML page, is scanned for text content. Subsequently, a
translation for the text content is taken from the content database. It is noted that multiple translations are needed as a single HTML page normally contains several pieces of text content .
However, it may happen that a translation is not available, for instance because the original website hosted on the first server has been changed. In such case, it is advantageous if the modifying unit is arranged to control the computer translator to provide a computer translation of the information to be translated and to replace the
information to be translated in the HTML information with the computer translation of this information if the content database does not comprise a translation of this
information. This ensures that a translation is always provided for.
In an embodiment of the present invention, the website translator comprises a content database management unit arranged for storing a computer translation of information to be translated in the content database. Hence, if a translation is not available in the content database, the generated computer translation can be stored in the content database for future use .
In an embodiment of the present invention, the content database management unit is arranged for storing an
externally received translation of the information to be translated in the content database. Examples of such
translations can be translations that are not generated by a computer but by a human translator. It is even possible that the content database comprises both the computer translation and the human translation. Attributes can be used to
distinguish between these translations, as will be discussed later.
In an embodiment of the present invention, the content database comprises a plurality of entries, each entry comprising a correlation between a hash of the information to be translated and at least one translation of this information. Here, the content database management unit is preferably arranged to hash the information to be translated by applying a predefined hash function to this information and to correlate the hashed information with the at least one translation of this information.
By applying a hashing function to a text segment, a hash is generated, for instance in the form of a hexadecimal number which is unique for that text segment. By using hash functions, the content database can easily be searched and constructed.
In an embodiment of the present invention, the content database management unit is further arranged to assign an attribute to the translation of the information to be translated. The attribute allows different translations for the same information to be translated, e.g. a text segment, to be distinguished. Furthermore, the attributes may contain information regarding the translation, such as status, origin, language, and quality.
The content database preferably comprises a correlation between information to be translated and at least two translations thereof, wherein the modifying unit is arranged to select one of said at least two translations based on the attributes assigned to the at least two translations. For instance, the modifying unit may select a particular
translation because it has the attribute "EN" (for English) where an English translation is needed. The attribute can therefore be a country or language code, and the modifying unit can be arranged to select one of said at least two translations based on the extracted language identifier, such as a language code and/or IP address.
In an embodiment of the present invention, the
modifying unit is arranged for modifying the received HTML information by replacing Cascaded Style Sheet (CSS)
information with different CSS information corresponding to the used and/or desired language. Apart from text content, a webpage contains layout information. This information can be incorporated in as in-line CSS statements or the HTML code may comprise a link to a CSS file stored for instance in the first server. According to the present invention, this layout information can be replaced with layout information specific for the desired and/or used language. The
information can be stored for instance on the content server.
Changing the layout information in dependence of the desired and/or used language allows for a different layout to be used for a given language. This is particular useful when changing between languages which are completely
different from each other, or for which a different reading direction must be used. For instance, a webpage optimized for the English language may not be useful for presenting Chinese text and vice-versa. By changing the layout this problem can be obviated.
The present invention also provides a website
translator system, comprising a plurality of website
translators as previously described. At least one of the content database, the content database management unit, the computer translator, and the server database is preferably arranged as a central unit common to each of the plurality of website translators. This allows multiple website
translators to use a common resource. This is particularly advantageous for the content database . As the network of website translators grows, so does the content in the content database. This allows the modifying unit to select a translation from the content database in more cases and for more languages instead of having to resort to a computer translation. Moreover, selecting a translation from the content database will in most cases prove to be a faster solution than providing a computer translation.
The present invention also provides a method for providing a translated version of a webpage of a website in response to a HyperText Transfer Protocol (HTTP) request from a client computer, wherein the website is hosted on a first server having a first host name. The method comprises the following subsequent steps:
a. receiving the HTTP request;
b. extracting a language identifier from the HTTP request and/or from an Internet Protocol (IP) address of the client computer, the language identifier corresponding to a language used and/or desired by a user of the client computer;
c. forwarding the HTTP request to the first server; d. receiving HTML information from the first server in response to the HTTP request;
e. modifying the received HTML information by replacing information to be translated in the received HTML information with a translated version thereof in correspondence with the used and/or desired language; f . sending the modified HTML information to the client computer for display as the translated version of the webpage of the website.
The method could further comprise providing a content database having stored therein a correlation between
information to be translated and a translation of this information. The replacing of information to be translated then comprises :
if a translation of the information to be translated is available in the content database, replacing the information to be translated with this translation;
if a translation of the information to be translated is not available in the content database, providing a computer translation of this information and replacing this
information with the computer translation.
Next, the invention will be described in more detail under reference to the accompanying drawings, wherein:
Figure 1 illustrates an embodiment of a website
translator according to the present invention; and
Figure 2 shows a network comprising a plurality of website translators according to the present invention.
Figure 1 shows an embodiment of a website translator according to the present invention. A user may operate a client computer 1 to send a HTTP request to website
translator 2. Website translator 2 receives the HTTP request using a first receiving unit 3. The HTTP (HTTP/1.1) request typically comprises a request line, such as "GET /home/index. html HTTP/1.1" and a host header such as "host: www.example.com". This HTTP request would request the webpage www.example.com/home/index.html from a server having the host name www.example.com. Website translator 2 is therefore able to extract the host name from the HTTP request. Furthermore, website translator 2 has an Internet Protocol (IP) address acting as the physical address the HTTP request is sent to. It should be noted that multiple host names can be attributed to a single IP address.
Next, a language identifier is extracted by extracting unit 4 from the HTTP request which corresponds to a language used and/or desired by the user. An example is the "accept- language" header field in the HTTP request. However, also the IP address of client computer 1 may be used as this address is related to the geographical location of client computer 1, thereby giving another indication of the
language desired and/or used by the user.
Website translator 2 comprises a server database 5 having stored therein a relation between the host name extracted from the HTTP request and the host name of the server which hosts the website requested by the user. For instance, such relation could be "example.com" and
"original.example.com", meaning that a user requesting a webpage on the server with the host name "example.com" is in fact requesting a translated version of a webpage of a website hosted on "original.example.com".
Forwarding unit 6 uses server database 5 to forward the received HTTP request to first server 7 based on the host name that is correlated with the host name extracted from the received HTTP request.
In response, first server 7 will send HTML information, e.g. in the form of a HTML page, to second receiving unit 8 of website translator 2. Typically, a HTML page comprises a plurality of HTML elements. These elements comprise a pair of tags, a start tag and an end tag, as well as some element attributes within the start tag and textual or graphical content between the start and end tags .
Website translator 2 comprises a modifying unit 9 for modifying the received HTML information. It does so by scanning the received HTML information, e.g. the HTML page, looking for tags and to extract the content in between the various start and end tags. If text content is found, modifying unit will consult content database 10 to look for a translation of the extracted text content into the desired and/or used language of the user. Modifying unit 9 will then replace the original text content in the HTML information with the translated version thereof. Consequently, any text content in the requested webpage is translated. Possible forms of text content are single words, phrases or parts thereof, paragraphs, or even entire pages. It should however be noted that it is advantageous to use moderately large units, e.g. phrases or parts thereof, to facilitate the reuse of the translation. On the one hand, parts should be large enough to enable a context to be established between neighboring words, whereas the parts should not be too large as this would severely limit the chance that another webpage comprises an identical piece of text.
In addition to text, a webpage may comprise a link to a media file, such as a picture. This picture may contain text information as well. Modifying unit 9 can therefore be arranged to extract the links to these media from the HTML information, and to replace the links with other links that point to corresponding media files, albeit in a different language. These translated versions of these media need not be located on first server 7. Next, a sending unit 11 sends the modified HTML
information to the user's computer 1 for display as the translated version of the website.
Website translator 2 may comprise a computer translator 12 for providing the translations of the text content.
Website translator 2 may be arranged to retrieve the
required translation directly from computer translator 12 instead of accessing content database 10. However, it is advantageous if the modifying unit 11 first inspects content database 10 to look for a translation and to use such translation if it is present. However, in case such
translation is not present, computer translator 12 may provide one .
Website translator 2 may comprise a content database management unit 13 to manage the addition and deletion of translations into content database 10 as well as the access to it. To that end, it may receive translations for text content from one or more human translators 14 and/or from computer translator 12.
In order to organize content database 10, content database management unit 13 applies a hashing function to the information to be translated. For instance, the word "hello" becomes the hash 0xl245fd4e5, being a hexadecimal number. In content database 10, this hash is linked to a translation. Whenever modifying unit 11 needs a translation of a piece of text, it will consult content database 10 via content database management unit 13, wherein the content database management unit 13 will first hash the information to be translated and subsequently look for a corresponding translation in content database 10. If such translation is found, that translation is returned to modifying unit 11. In case such translation is not found, computer translator 12 is consulted to provide one. This translation may subsequently be linked by the content database management unit 13 to the hash of the information to be translated to enable future use.
As shown in figure 1, several sources of translations exist. Moreover, for each piece of information to be translated, several translations in several languages may exist. For instance, an English phrase may have a
corresponding translation in Dutch as well as three
translations in Spanish. Of these three translations in Spanish, two are generated by a person, whereas one is generated by computer translator 12. To distinguish between these translations, attributes may be assigned to them. For instance, the Spanish translation may be contain the
following attributes [C/H] for human or computer
translation, [Am, Eu] for South American Spanish or European Spanish, [H/M/L] for setting a level of quality ranging from (h) igh, to (m) edium, to (l)ow. Modifying unit 11 can be arranged to select one of the available translations based on the attributes assigned to them. For instance, if the IP address of client computer 1 reveals that the user is located in Chili, modifying unit 11 will use the South
American Spanish translation.
The website translator according to the present
invention is particularly well suited to handle dynamic content, i.e. a webpage for which the text content changes often. Normally, when text content is extracted during the modification process, it is determined whether a translation is available in content database 10. In case a translation is not available, for instance because the content of the original webpage has changed, the system can be configured to provide a computer translation. However, in addition a request to a human translator may be issued. At a later stage, the translation provided by translator can be used in addition or instead of the computer translation. In this way, the translated version of the webpage is always
synchronous with the original webpage.
The present invention can easily be scaled, wherein a plurality of website translators 2 are arranged to each provide a translation of at least one website hosted on different first servers. Such network is illustrated in figure 2. Each of the two depicted website translators 2 is arranged to provide a translation of two different websites that are hosted on two different first servers. Again, the information in server database is used to correctly
ascertain which HTML information is requested.
As the website translators 2 all serve the same
purpose, it is advantageous to combine common components. For instance, content database, content database management unit and server database can be arranged as common central units accessible for each website translator. Such
arrangement allows the various website translator to benefit from each other. Such an approach will allow a website translator to use a translation for a given piece of text content that was originally translated for a different website translator. Such arrangement also reduces a
centralized approach for modifying and maintaining the various databases .
Although the invention has been described in detail, it should be obvious to the skilled person in the art that various modifications are possible to the embodiments shown without departing from the scope of the invention which is defined by the appended claims.

Claims

Claims
1. A website translator arranged for providing a
translated version of a webpage of a website in response to a HyperText Transfer Protocol (HTTP) request from a client computer, wherein the website is hosted on a first server having a first host name, the website translator comprising: a first receiving unit arranged for receiving the HTTP request ;
an extracting unit arranged for extracting a language identifier from the HTTP request and/or from an Internet Protocol (IP) address of the client computer, the language identifier corresponding to a language used and/or desired by a user of the client computer;
a forwarding unit for forwarding the HTTP request to the first server;
a second receiving unit arranged for receiving HTML information from the first server in response to the HTTP request;
a modifying unit for modifying the received HTML information by replacing information to be translated in the received HTML information with a translated version thereof in correspondence with the used and/or desired language;
a sending unit for sending the modified HTML
information to the client computer for display as the translated version of the webpage of the website.
2. The website translator according to claim 1, further comprising a server database having stored therein a
correlation between a host name extracted from the HTTP request and a host name of the first server, and wherein the extracting unit is arranged to extract a host name from the HTTP request, the forwarding unit being arranged to forward the HTTP request to the first server having a host name that correlates with the host name extracted from the HTTP request .
3. The website translator according to claim 2, wherein the website translator has an Internet Protocol (IP) address linked to a plurality of host names, and wherein the server database comprises a correlation for each of the host names with a host name of a respective first server.
4. The website translator according to any one of the claims 1-3, wherein the modifying unit is arranged for extracting text content in the HTTP request .
5. The website translator according to claim 4, wherein the information to be translated comprises a word, a phrase or part thereof, or a paragraph of the extracted text content .
6. The website translator according to any one of the previous claims, wherein the modifying unit is arranged for extracting a link to media content in the HTML information and to replace the link with a link to a translated version of the media content .
7. The website translator according to any one of the previous claims, further comprising a computer translator for providing the translated version as a computer
translation of the information to be translated.
8. The website translator according to any one of the previous claims, further comprising a content database having stored therein a correlation between information to be translated and a translation of this information.
9. The website translator according to claim 8, wherein the modifying unit is arranged to replace the information to be translated in the HTML information with the translation of this information if this translation is available in the content database .
10. The website translator according to claims 7 and 9, wherein the modifying unit is arranged to control the computer translator to provide a computer translation of the information to be translated and to replace the information to be translated in the HTML information with the computer translation of this information if the content database does not comprise a translation of this information.
11. The website translator according to any one of claims 8-10, further comprising a content database management unit arranged for storing a computer translation of information to be translated in the content database.
12. The website translator according to claim 11, wherein the content database management unit is arranged for storing an externally received translation of the information to be translated in the content database.
13. The website translator according to any one of the claims 8-12, wherein the content database comprises a plurality of entries, each entry comprising a correlation between a hash of the information to be translated and at least one translation of this information.
14. The website translator according to claim 11 or 12 and claim 13, wherein the content database management unit is arranged to hash the information to be translated by- applying a predefined hash function to this information and to correlate the hashed information with said at least one translation of this information.
15. The website translator according to claim 14, wherein the content database management unit is further arranged to assign an attribute to the translation of the information to be translated.
16. The website translator according to claim 15, wherein the content database comprises a correlation between
information to be translated and at least two translations thereof, wherein the modifying unit is arranged to select one of said at least two translations based on the
attributes assigned to said at least two translations.
17. The website translator according to claim 16, wherein the attribute is a country or language code, and wherein the modifying unit is arranged to select one of said at least two translations based on the extracted language identifier.
18. The website translator according to any of the previous claims, wherein the modifying unit is arranged for modifying the received HTML information by replacing Cascaded Style Sheet (CSS) information with different CSS information corresponding to the used and/or desired language.
19. A website translator system, comprising a plurality of website translators as defined in any one the previous claims, wherein at least one of the content database, the content database management unit, the computer translator, and the server database is arranged as a central unit common to each of said plurality of website translators.
20. A method for providing a translated version of a webpage of a website in response to a HyperText Transfer Protocol (HTTP) request from a client computer, wherein the website is hosted on a first server having a first host name, said method comprising:
receiving the HTTP request;
extracting a language identifier from the HTTP request and/or from an Internet Protocol (IP) address of the client computer, the language identifier corresponding to a
language used and/or desired by a user of the client
computer;
forwarding the HTTP request to the first server;
receiving HTML information from the first server in response to the HTTP request;
modifying the received HTML information by replacing information to be translated in the received HTML
information with a translated version thereof in
correspondence with the used and/or desired language;
sending the modified HTML information to the client computer for display as the translated version of the webpage of the website.
21. The method according to claim 20, further comprising: providing a content database having stored therein a correlation between information to be translated and a translation of this information;
wherein said replacing information to be translated
comprises : if a translation of the information to be translated is available in the content database, replacing the information to be translated with this translation;
if a translation of the information to be translated is not available in the content database, providing a computer translation of this information and replacing this
information with the computer translation.
PCT/NL2012/000016 2011-02-24 2012-02-24 Website translator, system and method WO2012115507A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
NL2006294 2011-02-24
NL2006294A NL2006294C2 (en) 2011-02-24 2011-02-24 Website translator, system, and method.

Publications (1)

Publication Number Publication Date
WO2012115507A1 true WO2012115507A1 (en) 2012-08-30

Family

ID=44514325

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/NL2012/000016 WO2012115507A1 (en) 2011-02-24 2012-02-24 Website translator, system and method

Country Status (2)

Country Link
NL (1) NL2006294C2 (en)
WO (1) WO2012115507A1 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150261880A1 (en) * 2014-03-15 2015-09-17 Google Inc. Techniques for translating user interfaces of web-based applications
WO2016010633A1 (en) * 2014-07-16 2016-01-21 United Parcel Service Of America, Inc. Language content translation
CN109074326A (en) * 2016-04-04 2018-12-21 沃文技术株式会社 translation system
US10296968B2 (en) 2012-12-07 2019-05-21 United Parcel Service Of America, Inc. Website augmentation including conversion of regional content
WO2021048659A1 (en) * 2019-09-11 2021-03-18 International Business Machines Corporation Translation of multi-format embedded files

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040167784A1 (en) * 2003-02-21 2004-08-26 Motionpoint Corporation Dynamic language translation of web site content
US20090157381A1 (en) * 2007-12-12 2009-06-18 Microsoft Corporation Web translation provider
US20090192783A1 (en) * 2008-01-25 2009-07-30 Jurach Jr James Edward Method and System for Providing Translated Dynamic Web Page Content
KR20100091923A (en) * 2009-02-10 2010-08-19 오의진 Method of servicing translation of web page written in many languages
WO2012009441A2 (en) * 2010-07-13 2012-01-19 Motionpoint Corporation Dynamic language translation of web site content

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040167784A1 (en) * 2003-02-21 2004-08-26 Motionpoint Corporation Dynamic language translation of web site content
US20090157381A1 (en) * 2007-12-12 2009-06-18 Microsoft Corporation Web translation provider
US20090192783A1 (en) * 2008-01-25 2009-07-30 Jurach Jr James Edward Method and System for Providing Translated Dynamic Web Page Content
KR20100091923A (en) * 2009-02-10 2010-08-19 오의진 Method of servicing translation of web page written in many languages
WO2012009441A2 (en) * 2010-07-13 2012-01-19 Motionpoint Corporation Dynamic language translation of web site content

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11367131B2 (en) 2012-12-07 2022-06-21 United Parcel Service Of America, Inc. Systems and methods of website integration
US11593867B2 (en) 2012-12-07 2023-02-28 United Parcel Service Of America, Inc. Systems and methods of website integration
US10296968B2 (en) 2012-12-07 2019-05-21 United Parcel Service Of America, Inc. Website augmentation including conversion of regional content
US10311504B2 (en) 2012-12-07 2019-06-04 United Parcel Service Of America, Inc. Website augmentation including conversion of regional content
US10719871B2 (en) 2012-12-07 2020-07-21 United Parcel Service Of America, Inc. Systems and methods of website integration
US20150261880A1 (en) * 2014-03-15 2015-09-17 Google Inc. Techniques for translating user interfaces of web-based applications
WO2016010633A1 (en) * 2014-07-16 2016-01-21 United Parcel Service Of America, Inc. Language content translation
US9965466B2 (en) 2014-07-16 2018-05-08 United Parcel Service Of America, Inc. Language content translation
EP3441887A4 (en) * 2016-04-04 2019-12-18 Wovn Technologies, Inc. Translation system
CN109074326B (en) * 2016-04-04 2022-02-18 沃文技术株式会社 Translation system
US10878203B2 (en) 2016-04-04 2020-12-29 Wovn Technologies, Inc. Translation system
CN109074326A (en) * 2016-04-04 2018-12-21 沃文技术株式会社 translation system
WO2021048659A1 (en) * 2019-09-11 2021-03-18 International Business Machines Corporation Translation of multi-format embedded files
GB2601463A (en) * 2019-09-11 2022-06-01 Ibm Translation of multi-format embedded files
US11373048B2 (en) 2019-09-11 2022-06-28 International Business Machines Corporation Translation of multi-format embedded files

Also Published As

Publication number Publication date
NL2006294C2 (en) 2012-08-27

Similar Documents

Publication Publication Date Title
US11372935B2 (en) Automatically generating a website specific to an industry
US7596609B1 (en) WWW addressing
US8201081B2 (en) Systems and methods for processing inoperative document links
CN100367276C (en) Method and appts for searching within a computer network
US8942973B2 (en) Content page URL translation
US20140331124A1 (en) Method for maintaining common data across multiple platforms
US20050278626A1 (en) Converting the format of a portion of an electronic document
US20080040094A1 (en) Proxy For Real Time Translation of Source Objects Between A Server And A Client
US20090144612A1 (en) Display of document data
US9251223B2 (en) Alternative web pages suggestion based on language
US9846686B2 (en) Methods for extending a document transformation server to process multiple documents from multiple sites and devices thereof
NL2006294C2 (en) Website translator, system, and method.
JP4657295B2 (en) Native language Internet address system
US10089395B2 (en) Third party content integration for search engine optimization
US20050138004A1 (en) Link modification system and method
US20210073482A1 (en) Translation system
US20170109442A1 (en) Customizing a website string content specific to an industry
JP5525623B2 (en) Remote printing
US20060167841A1 (en) Method and system for a unique naming scheme for content management systems
JP2009223608A (en) Reverse proxy system
CN106156128B (en) Method and device for realizing multi-language and multi-domain name service of website
US20020099852A1 (en) Mapping and caching of uniform resource locators for surrogate Web server
US9576065B2 (en) Method for maintaining common data across multiple platforms
JP2004110080A (en) Computer network connection method on internet by real name, and computer network system
JP2007128367A (en) Information retrieval knowhow management system

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 12712381

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 12712381

Country of ref document: EP

Kind code of ref document: A1