WO2016206336A1 - File extraction and restoration method favorable for translation work - Google Patents

File extraction and restoration method favorable for translation work Download PDF

Info

Publication number
WO2016206336A1
WO2016206336A1 PCT/CN2015/098668 CN2015098668W WO2016206336A1 WO 2016206336 A1 WO2016206336 A1 WO 2016206336A1 CN 2015098668 W CN2015098668 W CN 2015098668W WO 2016206336 A1 WO2016206336 A1 WO 2016206336A1
Authority
WO
WIPO (PCT)
Prior art keywords
translation
document
sentence
translator
translated
Prior art date
Application number
PCT/CN2015/098668
Other languages
French (fr)
Chinese (zh)
Inventor
江潮
罗伟峰
Original Assignee
武汉传神信息技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 武汉传神信息技术有限公司 filed Critical 武汉传神信息技术有限公司
Publication of WO2016206336A1 publication Critical patent/WO2016206336A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language

Definitions

  • the invention relates to an artificial intelligence and document processing method which is convenient for translation work.
  • the technical problem to be solved by the invention is to simplify the translation work and improve the translation efficiency, and propose a file extraction and restoration method which is beneficial to the translation work.
  • the file extraction and restoration method proposed by the present invention for facilitating translation work includes the following steps:
  • the translator processing document has three fields of "original”, “translation” and id, the "original” field corresponds to the original text of the sentence, and the "translation” field corresponds to the sentence translation;
  • Disassembling the document object to be translated into a data set to be translated with a sentence as a minimum unit includes the following steps:
  • the Aspose component provides a paragraph object, a child node object, and a Run object that facilitates character operations, and the Run object is a continuous set of character segments in a consistent character format in the document.
  • the merge of a Run object containing only one sentence fragment into a subsequent Run object includes the following steps:
  • the invention also includes establishing a dictionary object, the key of the dictionary object is the original text, the value is the translation, the original-translation is a key-value pair, and when the translator processes the document, the corresponding original text-translation is recorded in the record. , respectively, write the dictionary object.
  • step 5 if the translation field of the record of an id is empty, in the dictionary object, the original text of the record of the id is used as a key to find whether there is a matching translation value, and if found, the translation is filled with the translation. Translation column.
  • the translator is traversed to process the document, and the repeated sentences are marked to remind the translator that the translation is not required.
  • the translator is traversed to process the document, and the sentence in the original text is automatically matched with the term in the termbase, and if the sentence is matched, the term sentence is annotated, so that Translation work is smoother.
  • the translator is traversed to process the document, and the sentences in the original text are matched one by one with the corpus in the corpus, and if they match, the corpus translation in the corpus is filled in. Go to the "translation" field corresponding to the matching sentence.
  • the present invention simplifies the work of the translator, so that the translator does not need to master the processing methods of various mainstream document programs such as PPT, Word, EXCL, and PDF, so that more energy can be focused on the work of text translation.
  • various mainstream document programs such as PPT, Word, EXCL, and PDF
  • all repetitive sentences need only be translated once, others are automatically filled and generated; collecting each translation result, when When you receive a new manuscript, you can directly use the previously accumulated corpus and terminology to further improve translation efficiency.
  • FIG. 1 is a screenshot of a translator translation processing interface according to a specific embodiment of the present invention. The figure mainly shows a translator processing document filled with the original text.
  • FIG. 2 is a screenshot of another translator translation processing interface according to a specific embodiment of the present invention. The figure mainly shows a translator who has processed the pre-processed document.
  • Figure 3 is an overall flow chart of the present invention.
  • the method for extracting and restoring files for translation work proposed by the present invention comprises the following steps:
  • the translator processing document has three fields of "original”, “translation” and id, the "original” field corresponds to the original text of the sentence, and the "translation” field corresponds to the sentence translation;
  • the paragraph object contains all the text information of the document object, and does not include symbols, images or other non-text information that does not need to be translated;
  • the Aspose component provides paragraph objects, child node objects, and Run objects that facilitate character operations.
  • the Run object is a collection of characters in a consistent character format within a document.
  • Run object There are 4 cases of the obtained Run object: 1 a Run object contains multiple complete sentences; 2 a Run object contains multiple complete sentences and a certain sentence segment; 3 A Run object contains only one sentence segment; 4 The Run object contains a complete sentence. Therefore, further sentence processing is required, and the existing Run objects are split and merged to obtain only one one. A complete sentence of the Run object.
  • S4 traverse each Run object, split all Run objects into a Run object containing only one complete sentence, or a Run object containing only one sentence fragment.
  • the method used is for example:
  • Run object contains multiple complete sentences
  • the Run object is split with a sentence terminator and split into several Run objects that contain only one complete sentence.
  • Run object contains multiple complete sentences and a sentence fragment
  • the Run object is bounded by a sentence terminator, split into several Run objects containing only one complete sentence, and a Run containing a sentence fragment. Object.
  • Run-1 "In order to solve the above problem, a special will be proposed”
  • Run-2 Word, Excel, PPT, PDF
  • Run-3 "A variety of mainstream document formats are converted into a unified standard style”
  • Run-4 "Word”
  • Run-5 "Documents and can also be converted in turn The standard obtained
  • Run-6 "Word”
  • Run-7 "The method of restoring the document to the original format. To simplify the translation work and improve the translation efficiency.”
  • Run-1 to Run-6 above only contain one sentence fragment, and Run-7 contains two seemingly complete sentences.
  • Run-1 through Run-6 need to be merged, and Run-7 needs to be split further.
  • S5-1 takes out the character content of the Run object of only one sentence segment, stores it in the temporary storage unit, and then deletes the Run object in the paragraph object;
  • S5-2 checks the next Run object. If the character content of the Run object is only a sentence fragment, the character content of the Run object is taken out, added to the temporary storage unit, and then the Run object is deleted in the paragraph object, and the inspection continues. The next Run object; otherwise, the temporary storage unit is taken to store the character content, added to the character content of the next Run object, and then the temporary storage unit is emptied.
  • the translator processing the document is sent to the translator, and the translator translates the original text of the “original” field one by one in the translator processing document, and fills in the corresponding “translation” field until the processing is completed;
  • the key of the dictionary object is the original text, the value is the translation, the original-translation is a key-value pair; when traversing the translator to process the document, the corresponding original-translation in a record is written separately Enter the dictionary object.
  • the original text of the record of the id is used as a key to find whether there is a matching translation value, and if found, the translation column is filled with the translation.
  • EXCL, PPT, and PDF documents mentioned in the present invention those skilled in the art can implement the character information contained in these documents by using the ASpose component, and perform the sentence-based unit according to the method disclosed by the present invention.
  • the data collection is split and combined, the translator is processed to process the document, which is convenient for translators to translate; and after the translator translates, the translation of the translated document is processed.
  • the translator For example, for an EXCL, PDF, and PPT document, it can be processed by using the method of the above embodiment after converting it into a corresponding Word document by using an existing tool.
  • EXCL documents you can also use the ASpose component directly.

Abstract

An artificial intelligence file processing method favorable for translation work. The method comprises: by means of support of an Aspose assembly to a file processing operation, disassemble a file object to be translated into a set of data to be translated, wherein a simple sentence serves as a minimum unit in the set of the data to be translated; establish a standard interpreter processing document, copy each sentence in the set of data to be translated to the interpreter processing document one by one; an interpreter fills translations into the interpreter processing document one by one; traverse the set of data to be translated and the interpreter processing document, and writing the translations into the set of data to be translated; and restore the set of data to be translated to an original manuscript format document. Various manuscripts of different formats can be converted into a standard interpreter processing document. Sentences repeatedly appearing for multiple times do not need to be repeatedly translated for multiple times, the translation work of the interpreter is simplified, the translation efficiency is improved, the execution efficiencies of extraction logics and restoration logics are high, and a restored translation manuscript reserves an original manuscript format.

Description

一种利于翻译工作的文件抽取和还原方法File extraction and restoration method for translation work 技术领域Technical field
本发明涉及一种利于翻译工作的人工智能、文档处理的方法。The invention relates to an artificial intelligence and document processing method which is convenient for translation work.
背景技术Background technique
伴随着中国跻身于世界第二大经济体,“一带一路”等战略的稳步实施,中国各领域与世界的联系更为紧密。国际化的进程中多国之间的沟通交流所需要的语言支持服务市场显得愈加庞大,这给翻译行业带来了新的机遇和挑战。With China's participation in the world's second largest economy and the steady implementation of the "One Belt, One Road" strategy, China's various fields are more closely linked to the world. The language support service market required for communication between many countries in the process of internationalization is becoming more and more huge, which brings new opportunities and challenges to the translation industry.
翻译行业的从业人员每天要面对大量的各种格式的需要翻译的稿件,由于稿件的种类繁多,相应的翻译人员就需要掌握各类文档程序如Word,Excel,PPT,PDF的使用以及各类文档辅助翻译工具的使用。这对于专职的翻译人员来说是很大的挑战和门槛,很明显这类问题已经阻碍到了整个行业的发展乃至于中国全球化的进程。The practitioners in the translation industry face a large number of manuscripts in various formats that need to be translated every day. Due to the wide variety of manuscripts, the corresponding translators need to master the use of various document programs such as Word, Excel, PPT, PDF and various types of documents. Use of document assisted translation tools. This is a big challenge and threshold for full-time translators. Obviously, such problems have hindered the development of the entire industry and the process of globalization in China.
因此,需要提出一种将等多种主流文档格式转换成为统一的标准样式的文档并且也可以反过来将转化得到的标准文档还原为原稿格式的方法。以简化翻译工作、提高翻译效率。Therefore, there is a need to propose a method of converting a plurality of mainstream document formats into a unified standard style document and also conversely reverting the converted standard document to an original format. To simplify translation work and improve translation efficiency.
发明内容Summary of the invention
本发明所要解决的技术问题是简化翻译工作、提高翻译效率,提出一种利于翻译工作的文件抽取和还原方法。The technical problem to be solved by the invention is to simplify the translation work and improve the translation efficiency, and propose a file extraction and restoration method which is beneficial to the translation work.
为解决上述技术问题,本发明提出的利于翻译工作的文件抽取和还原方法,包括以下步骤:In order to solve the above technical problem, the file extraction and restoration method proposed by the present invention for facilitating translation work includes the following steps:
1)利用Aspose动态链接库对文档处理的操作支持,将待翻译的文档对象拆解成以单句为最小单位的待翻译数据集合;1) Using the Aspose dynamic link library to support the operation of the document processing, and disassembling the document object to be translated into a data set to be translated with a single sentence as a minimum unit;
2)建立一个译员处理文档,所述译员处理文档设有“原文”、“译文”和id三个字段,所述“原文”字段对应句子原文,“译文”字段对应句子译文;2) Establish a translator processing document, the translator processing document has three fields of "original", "translation" and id, the "original" field corresponds to the original text of the sentence, and the "translation" field corresponds to the sentence translation;
3)将所述以单句为最小单位的待翻译数据集合中的每一个句子按顺序逐一复制到所述译员处理文档的“原文”字段,然后将待翻译数据集合中该句子的内容用一个具有唯一性的占位符号Guid替代,且相邻的占位符号Guid具有不同的字符格式;所述id字段的内容与所述不同的Guid之间具有一一映射的关系;3) copying each sentence in the set of data to be translated in a single sentence as a minimum unit one by one to the "original" field of the translator processing document, and then using the content of the sentence in the data set to be translated one with The unique placeholder Guid is replaced, and the adjacent placeholder Guid has a different character format; the content of the id field has a one-to-one mapping relationship with the different Guid;
4)将所述译员处理文档下发到译员,所述译员在所述译员处理文档中逐个翻 译“原文”字段的原文,填写到对应的“译文”字段,直到处理完成;4) delivering the translator processing document to the translator, the translator translating one by one in the translator processing document Translate the original text of the "original" field into the corresponding "translation" field until the processing is completed;
5)遍历所述待翻译数据集合和译员处理文档,根据不同Guid对应的不同id,找到该id对应的译文,覆盖写到所述待翻译数据集合中该对应Guid的位置。5) traversing the data set to be translated and the translator processing document, and finding a translation corresponding to the id according to different ids corresponding to different Guids, and overwriting the location of the corresponding Guid written in the data set to be translated.
6)调用Aspose动态链接库,将所述待翻译数据集合还原为原稿格式的文档。6) Calling the Aspose dynamic link library to restore the data set to be translated to a document in an original format.
所述将待翻译的文档对象拆解成以句为最小单位的待翻译数据集合,包括以下步骤:Disassembling the document object to be translated into a data set to be translated with a sentence as a minimum unit includes the following steps:
1-1调用Aspose组件;1-1 call the Aspose component;
1-2遍历所述文档对象,得到全部段落对象,所述段落对象包含文档对象全部的文字信息,而不包括无需翻译的符号、图像或其它非文字信息;1-2 traversing the document object to obtain all paragraph objects, the paragraph object containing all the text information of the document object, and not including symbols, images or other non-text information without translation;
1-3遍历每一个段落对象的子节点对象,从而得到若干个的字符集合对象Run。Aspose组件中提供段落对象、子节点对象,以及方便字符操作的Run对象,所述Run对象是文档内连续的字符格式一致的字符片段集合。1-3 traverses the child node object of each paragraph object, thereby obtaining a number of character set objects Run. The Aspose component provides a paragraph object, a child node object, and a Run object that facilitates character operations, and the Run object is a continuous set of character segments in a consistent character format in the document.
1-4遍历每一个Run对象,将全部Run对象拆分成一个个仅包含有一个完整的句子的Run对象,或者为仅包含有一个句子片段的Run对象;1-4 traversing each Run object, splitting all Run objects into a Run object containing only one complete sentence, or a Run object containing only one sentence fragment;
1-5遍历每一个Run对象,将仅包含有句子片段的Run对象合并到其后续的仅包含有一个完整的句子的Run对象中。1-5 traverses each Run object, merging the Run object containing only the sentence fragment into its subsequent Run object containing only one complete sentence.
完成后,得到以句子为最小单位的,一个个仅包含有一个完整的句子的Run对象的集合。When you're done, you get a collection of Run objects that contain the sentence as the smallest unit, containing only one complete sentence.
所述将仅包含有一个句子片段的Run对象合并到后续Run对象,包括以下步骤:The merge of a Run object containing only one sentence fragment into a subsequent Run object includes the following steps:
1-4-1将仅为一个句子片段的Run对象的字符内容取出,存放在临时存储单元,然后在段落对象中删除该Run对象;1-4-1 will take out the character content of the Run object of only one sentence fragment, store it in the temporary storage unit, and then delete the Run object in the paragraph object;
1-4-2检查下一个Run对象,如果该Run对象的字符内容仅为一个句子片段,则取出该Run对象的字符内容,添加到临时存储单元,然后在段落对象中删除该Run对象,继续检查下下一个Run对象;否则,取出临时存储单元存放字符内容,添加到该下一个Run对象的字符内容之前,然后清空所述临时存储单元。1-4-2 Check the next Run object. If the character content of the Run object is only a sentence fragment, extract the character content of the Run object, add it to the temporary storage unit, and then delete the Run object in the paragraph object, and continue. Check the next Run object; otherwise, take out the temporary storage unit to store the character content, add it to the character content of the next Run object, and then empty the temporary storage unit.
1-4-3如果该下一个Run对象的字符内容是以句子结束符作为结尾的,则将所述临时存储单元存放的字符内容取出,添加到该下一个Run对象的字符内容之前,然后清空所述临时存储单元。 1-4-3 If the character content of the next Run object is terminated by the sentence terminator, the character content stored in the temporary storage unit is taken out, added to the character content of the next Run object, and then cleared. The temporary storage unit.
本发明还包括,建立一个字典对象,所述字典对象的key为原文,value为译文,原文-译文为一个键值对;在遍历所述译员处理文档时,将一个记录中对应的原文-译文,分别写入所述字典对象。The invention also includes establishing a dictionary object, the key of the dictionary object is the original text, the value is the translation, the original-translation is a key-value pair, and when the translator processes the document, the corresponding original text-translation is recorded in the record. , respectively, write the dictionary object.
在步骤5)中,如果一个id所在记录的译文栏为空,则在所述字典对象中,以该id所在记录的原文为key去查找是否有匹配的译文value,如果找到则以该译文填充译文栏。In step 5), if the translation field of the record of an id is empty, in the dictionary object, the original text of the record of the id is used as a key to find whether there is a matching translation value, and if found, the translation is filled with the translation. Translation column.
如果在所述字典对象中,没有找到匹配的译文value,则该句为漏译,直接使用原文进行填充,方便审校人员发现。If no matching translation value is found in the dictionary object, the sentence is missing and directly filled with the original text, which is convenient for the reviewer to find.
进一步的,在将所述译员处理文档下发到译员之前,遍历所述译员处理文档,将重复的句子标记出来,提醒译员不需要重复翻译。Further, before the translator processing document is sent to the translator, the translator is traversed to process the document, and the repeated sentences are marked to remind the translator that the translation is not required.
进一步的,在将所述译员处理文档下发到译员之前,遍历所述译员处理文档,将原文中的句子与术语库中的术语进行自动匹配,如果匹配,则对该术语句子进行批注,使得翻译工作更加顺畅。Further, before the translator processing document is sent to the translator, the translator is traversed to process the document, and the sentence in the original text is automatically matched with the term in the termbase, and if the sentence is matched, the term sentence is annotated, so that Translation work is smoother.
更进一步的,在将所述译员处理文档下发到译员之前,遍历所述译员处理文档,将原文中的句子逐一与语料库中的语料比对匹配,如果匹配,则将语料库中的语料译文填写到该匹配句子对应的“译文”字段内。Further, before the translator processing the document is sent to the translator, the translator is traversed to process the document, and the sentences in the original text are matched one by one with the corpus in the corpus, and if they match, the corpus translation in the corpus is filled in. Go to the "translation" field corresponding to the matching sentence.
有益效果:本发明简化翻译人员的工作,使得翻译人员无需掌握各类主流文档程序如PPT、Word、EXCL、PDF的使用处理方法,从而能有更多的精力专注于文字翻译的工作中去。另外,通过在处理过程中自动预分析所需翻译的文稿,搜索出重复性的句子进行标记,所有重复性的句子均只需要翻译一次,其他则自动填充生成;收集每次的翻译成果,当接受到新的稿件时可以直接使用之前积累出来的语料和术语等信息,更进一步提升了翻译效率。Advantageous Effects: The present invention simplifies the work of the translator, so that the translator does not need to master the processing methods of various mainstream document programs such as PPT, Word, EXCL, and PDF, so that more energy can be focused on the work of text translation. In addition, by automatically pre-analysing the required translated documents during processing, searching for repetitive sentences for marking, all repetitive sentences need only be translated once, others are automatically filled and generated; collecting each translation result, when When you receive a new manuscript, you can directly use the previously accumulated corpus and terminology to further improve translation efficiency.
附图说明DRAWINGS
下面结合附图和具体实施方式对本发明的技术方案作进一步具体说明。The technical solutions of the present invention are further described in detail below with reference to the accompanying drawings and specific embodiments.
图1为本发明具体实施方式的译员翻译处理界面截图,图中的主要展示了一个已填充原文的译员处理文档。FIG. 1 is a screenshot of a translator translation processing interface according to a specific embodiment of the present invention. The figure mainly shows a translator processing document filled with the original text.
图2为本发明具体实施方式的另一译员翻译处理界面截图,图中的主要展示了一个已经过预处理的译员处理文档。2 is a screenshot of another translator translation processing interface according to a specific embodiment of the present invention. The figure mainly shows a translator who has processed the pre-processed document.
图3为本发明整体流程图。 Figure 3 is an overall flow chart of the present invention.
具体实施方式detailed description
本发明提出的利于翻译工作的文件抽取和还原方法,包括以下步骤:The method for extracting and restoring files for translation work proposed by the present invention comprises the following steps:
1)利用Aspose动态链接库对文档处理的操作支持,将待翻译的文档对象拆解成以单句为最小单位的待翻译数据集合;1) Using the Aspose dynamic link library to support the operation of the document processing, and disassembling the document object to be translated into a data set to be translated with a single sentence as a minimum unit;
2)建立一个译员处理文档,所述译员处理文档设有“原文”、“译文”和id三个字段,所述“原文”字段对应句子原文,“译文”字段对应句子译文;2) Establish a translator processing document, the translator processing document has three fields of "original", "translation" and id, the "original" field corresponds to the original text of the sentence, and the "translation" field corresponds to the sentence translation;
3)将所述以单句为最小单位的待翻译数据集合中的每一个句子按顺序逐一复制到所述译员处理文档的“原文”字段,然后将待翻译数据集合中该句子的内容用一个具有唯一性的占位符号Guid替代,且相邻的占位符号Guid具有不同的字符格式;所述id字段的内容与所述不同的Guid之间具有一一映射的关系;3) copying each sentence in the set of data to be translated in a single sentence as a minimum unit to the "original" field of the translator processing document one by one, and then using the content of the sentence in the data set to be translated one with The unique placeholder Guid is replaced, and the adjacent placeholder Guid has a different character format; the content of the id field has a one-to-one mapping relationship with the different Guid;
4)将所述译员处理文档下发到译员,所述译员在所述译员处理文档中逐个翻译“原文”字段的原文,填写到对应的“译文”字段,直到处理完成;4) delivering the translator processing document to the translator, the translator translating the original text of the "original" field one by one in the translator processing document, and filling in the corresponding "translation" field until the processing is completed;
5)遍历所述待翻译数据集合和译员处理文档,根据不同Guid对应的不同id,找到该id对应的译文,覆盖写到所述待翻译数据集合中该对应Guid的位置。5) traversing the data set to be translated and the translator processing document, and finding a translation corresponding to the id according to different ids corresponding to different Guids, and overwriting the location of the corresponding Guid written in the data set to be translated.
6)调用Aspose动态链接库,将所述待翻译数据集合还原生成文档处理工具所识别的翻译文稿。6) Calling the Aspose dynamic link library, and restoring the data set to be translated to generate a translated document identified by the document processing tool.
为更加理解本发明,下面以Word文档的处理、翻译为例,详细描述本发明的翻译处理过程:In order to better understand the present invention, the translation process of the present invention is described in detail below by taking the processing and translation of a Word document as an example:
S1、调用Aspose组件;S1, calling the Aspose component;
S2、遍历待翻译的Word文档对象,得到全部段落对象,该段落对象包含文档对象全部的文字信息,而不包括无需翻译的符号、图像或其它非文字信息;S2, traversing the Word document object to be translated, and obtaining all the paragraph objects, the paragraph object contains all the text information of the document object, and does not include symbols, images or other non-text information that does not need to be translated;
S3、遍历每一个段落对象的子节点对象,从而得到若干个的字符集合对象Run;S3, traversing the child node object of each paragraph object, thereby obtaining a plurality of character set objects Run;
Aspose组件提供段落对象、子节点对象,以及方便字符操作的Run对象;Run对象是文档内连续的字符格式一致的字符集合。The Aspose component provides paragraph objects, child node objects, and Run objects that facilitate character operations. The Run object is a collection of characters in a consistent character format within a document.
得到的Run对象存在4种情况:①一个Run对象包含有多个完整的句子;②一个Run对象包含有多个完整的句子以及某一个句子片段;③一个Run对象仅包含一个句子片段;④一个Run对象包含一个完整的句子。因此,需要进一步的断句处理,对现有的Run对象进行拆分以及合并,得到一个个仅包含有一 个完整的句子的Run对象。There are 4 cases of the obtained Run object: 1 a Run object contains multiple complete sentences; 2 a Run object contains multiple complete sentences and a certain sentence segment; 3 A Run object contains only one sentence segment; 4 The Run object contains a complete sentence. Therefore, further sentence processing is required, and the existing Run objects are split and merged to obtain only one one. A complete sentence of the Run object.
S4、拆分:遍历每一个Run对象,将全部Run对象拆分成一个个仅包含有一个完整的句子的Run对象,或者为仅包含有一个句子片段的Run对象。所采用的方式例如:S4, split: traverse each Run object, split all Run objects into a Run object containing only one complete sentence, or a Run object containing only one sentence fragment. The method used is for example:
从第一个所述Run对象开始,检查该Run对象的字符内容,Starting from the first Run object, check the character content of the Run object.
如果一个Run对象仅包含有一个完整的句子,或一个句子片段,则直接检查下一Run对象;If a Run object contains only one complete sentence, or a sentence fragment, the next Run object is directly checked;
如果一个Run对象包含有多个完整的句子,则将该Run对象以句子结束符为界,拆分成几个仅包含一个完整句子的Run对象。If a Run object contains multiple complete sentences, the Run object is split with a sentence terminator and split into several Run objects that contain only one complete sentence.
如果一个Run对象包含有多个完整的句子以及一个句子片段,则将该Run对象以句子结束符为界,拆分成几个仅包含一个完整句子的Run对象,以及一个包含一个句子片段的Run对象。If a Run object contains multiple complete sentences and a sentence fragment, the Run object is bounded by a sentence terminator, split into several Run objects containing only one complete sentence, and a Run containing a sentence fragment. Object.
例如,一段文字为“为了解决上述问题,特提出一种将Word,Excel,PPT,PDF等多种主流文档格式转换成为统一的标准样式的Word文档并且也可以反过来将转化得到的标准Word文档还原为原稿格式的方法。以简化翻译工作、提高翻译效率。”For example, a paragraph of text is "In order to solve the above problems, a word document that converts various mainstream document formats such as Word, Excel, PPT, PDF, etc. into a unified standard style and can also be converted into a standard Word document is proposed. A method of restoring to the original format to simplify translation and improve translation efficiency."
对上述一段文字,应用Aspose组件,遍历上述段落对象后,得到若干个的字符集合对象Run,分别顺序为:Run-1:“为了解决上述问题,特提出一种将”、Run-2:“Word,Excel,PPT,PDF”、Run-3:“等多种主流文档格式转换成为统一的标准样式的”、Run-4:“Word”、Run-5:“文档并且也可以反过来将转化得到的标准”、Run-6:“Word”、Run-7:“文档还原为原稿格式的方法。以简化翻译工作、提高翻译效率。”For the above paragraph, the Aspose component is applied, and after traversing the above paragraph object, several character collection objects Run are obtained, respectively, in the order: Run-1: "In order to solve the above problem, a special will be proposed", Run-2: Word, Excel, PPT, PDF", Run-3: "A variety of mainstream document formats are converted into a unified standard style", Run-4: "Word", Run-5: "Documents and can also be converted in turn The standard obtained, Run-6: "Word", Run-7: "The method of restoring the document to the original format. To simplify the translation work and improve the translation efficiency."
显然,上述Run-1至Run-6,都只是包含一个句子片段,Run-7包含两个看似完整的句子。要生成以完整句子为单位的数据集合,Run-1至Run-6需要合并,而Run-7需要进一步拆分。Obviously, Run-1 to Run-6 above only contain one sentence fragment, and Run-7 contains two seemingly complete sentences. To generate a data set in full sentences, Run-1 through Run-6 need to be merged, and Run-7 needs to be split further.
S5、合并:遍历每一个Run对象,将仅包含有句子片段的Run对象合并到其后续的仅包含有一个完整的句子的Run对象中。具体包括以下步骤:S5. Merge: Iterate through each Run object and merge the Run object containing only the sentence fragment into its subsequent Run object containing only one complete sentence. Specifically, the following steps are included:
S5-1将仅为一个句子片段的Run对象的字符内容取出,存放在临时存储单元,然后在段落对象中删除该Run对象; S5-1 takes out the character content of the Run object of only one sentence segment, stores it in the temporary storage unit, and then deletes the Run object in the paragraph object;
S5-2检查下一个Run对象,如果该Run对象的字符内容仅为一个句子片段,则取出该Run对象的字符内容,添加到临时存储单元,然后在段落对象中删除该Run对象,继续检查下下一个Run对象;否则,取出临时存储单元存放字符内容,添加到该下一个Run对象的字符内容之前,然后清空临时存储单元。S5-2 checks the next Run object. If the character content of the Run object is only a sentence fragment, the character content of the Run object is taken out, added to the temporary storage unit, and then the Run object is deleted in the paragraph object, and the inspection continues. The next Run object; otherwise, the temporary storage unit is taken to store the character content, added to the character content of the next Run object, and then the temporary storage unit is emptied.
S5-3如果该下一个Run对象的字符内容是以句子结束符作为结尾的,则将所述临时存储单元存放的字符内容取出,添加到该下一个Run对象的字符内容之前,然后清空所述临时存储单元。S5-3, if the character content of the next Run object is terminated by a sentence terminator, the character content stored in the temporary storage unit is taken out, added to the character content of the next Run object, and then the content is cleared. Temporary storage unit.
完成后,得到以句子为最小单位的,一个个仅包含有一个完整的句子的Run对象的集合。When you're done, you get a collection of Run objects that contain the sentence as the smallest unit, containing only one complete sentence.
S6、建立一个译员处理文档,译员处理文档设有“原文”、“译文”和id三个字段,“原文”字段对应句子原文,“译文”字段对应句子译文;S6. Establishing a translator to process the document, and the translator processing the document has three fields of “original”, “translation” and id, the “original” field corresponds to the original text of the sentence, and the “translation” field corresponds to the sentence translation;
S7、将以单句为最小单位的待翻译数据集合中的每一个句子按顺序逐一复制到所述译员处理文档的“原文”字段,然后将待翻译数据集合中该句子的内容用一个具有唯一性的占位符号Guid替代,且相邻的占位符号Guid具有不同的字符格式,例如,赋予相邻的占位符号Guid以不同的颜色,使相邻Guid的字符格式互不相同;id字段的内容与不同的Guid之间具有一一映射的关系;S7. Copy each sentence in the data set to be translated in a single sentence as a minimum unit to the "original" field of the translator processing document one by one, and then use the uniqueness of the content of the sentence in the data set to be translated. The placeholder Guid is replaced, and the adjacent placeholders Guid have different character formats, for example, the adjacent placeholders Guid are given different colors, so that the adjacent Guid's character formats are different from each other; the id field There is a one-to-one mapping relationship between content and different Guids;
S8、遍历所述译员处理文档,将重复的句子标记出来。如图2所示,“Android集成指南”有重复出现,则用色带标出,提醒译员不需要重复翻译。S8. Traverse the translator to process the document and mark the repeated sentences. As shown in Figure 2, the "Android Integration Guide" is repeated, and it is marked with a ribbon to remind translators not to repeat the translation.
S9、遍历所述译员处理文档,将原文中的句子与术语库中的术语进行自动匹配,如果匹配,则对该术语句子进行批注,使得翻译工作更加顺畅。S9. Traversing the translator to process the document, automatically matching the sentence in the original text with the term in the termbase, and if so, appending the term sentence to make the translation work smoother.
S10、遍历所述译员处理文档,将原文中的句子逐一与语料库中的语料比对匹配,如果100%匹配,则将语料库中的语料译文填写到该匹配句子对应的“译文”字段内。如图2中,“SDK下载”在语料中存在常规对应译文“SDK download”,“常见问题”在语料中存在常规对应译文“FAQ”。将语料库中的语料译文填写到该匹配句子对应的“译文”字段内。如果匹配度不足100%时,该行会被标记为一定颜色。指示该句子需要译员翻译,但是先自动填充了语料译文,供译员参考、修改。S10. Traversing the translator to process the document, and matching the sentences in the original text with the corpus in the corpus one by one. If 100% matches, the corpus translation in the corpus is filled into the “translation” field corresponding to the matching sentence. As shown in Fig. 2, "SDK Download" has a conventional corresponding translation "SDK download" in the corpus, and "Frequently Asked Questions" has a conventional corresponding translation "FAQ" in the corpus. Fill in the corpus translation in the corpus into the "translation" field corresponding to the matching sentence. If the match is less than 100%, the line will be marked as a certain color. Instruct the sentence to require translator translation, but first automatically fill in the corpus translation for the translator to refer to and modify.
S11、将译员处理文档下发到译员,译员在所述译员处理文档中逐个翻译“原文”字段的原文,填写到对应的“译文”字段,直到处理完成; S11. The translator processing the document is sent to the translator, and the translator translates the original text of the “original” field one by one in the translator processing document, and fills in the corresponding “translation” field until the processing is completed;
S12、建立一个字典对象,所述字典对象的key为原文,value为译文,原文-译文为一个键值对;在遍历所述译员处理文档时,将一个记录中对应的原文-译文,分别写入所述字典对象。S12, creating a dictionary object, the key of the dictionary object is the original text, the value is the translation, the original-translation is a key-value pair; when traversing the translator to process the document, the corresponding original-translation in a record is written separately Enter the dictionary object.
S13、将该语料对信息上传至服务器作为新的语料保留下来,供下次翻译工作中使用,参考。S13. Uploading the corpus information to the server as a new corpus for use in the next translation work, reference.
S14、遍历所述待翻译数据集合和译员处理文档,根据不同Guid对应的不同id,找到该id对应的译文,覆盖写到所述待翻译数据集合中该对应Guid的位置。S14. Traversing the data set to be translated and the processing of the document by the translator, and finding a translation corresponding to the id according to different ids corresponding to different Guids, and overwriting the location of the corresponding Guid in the data set to be translated.
如果一个id所在记录的译文栏为空,则在字典对象中,以该id所在记录的原文为key去查找是否有匹配的译文value,如果找到则以该译文填充译文栏。If the translation field of the record of an id is empty, in the dictionary object, the original text of the record of the id is used as a key to find whether there is a matching translation value, and if found, the translation column is filled with the translation.
如果在字典对象中,没有找到匹配的译文value,则该句为漏译,直接使用原文进行填充,方便审校人员发现。If no matching translation value is found in the dictionary object, the sentence is missing and directly filled with the original text for the reviewer to find.
S15、调用Aspose动态链接库,将所述待翻译数据集合还原成原稿格式文件。S15. Call the Aspose dynamic link library to restore the data set to be translated into an original format file.
对于本发明所提到的EXCL、PPT、PDF文档,本领域技术人员通过阅读本文,能够借助ASpose组件,实现这些文档所包含的字符信息,并按照本发明所揭示的方法,进行以句为单位的数据集合的拆分、组合处理,生成译员处理文档,便于译员翻译;以及译员翻译后,译文文稿的还原处理。例如,对于EXCL、PDF、PPT文档,可以先利用现有工具转换成相应Word文档后,在利用上述实施例的方法处理。对于EXCL文档,也可以直接借助ASpose组件处理。For the EXCL, PPT, and PDF documents mentioned in the present invention, those skilled in the art can implement the character information contained in these documents by using the ASpose component, and perform the sentence-based unit according to the method disclosed by the present invention. The data collection is split and combined, the translator is processed to process the document, which is convenient for translators to translate; and after the translator translates, the translation of the translated document is processed. For example, for an EXCL, PDF, and PPT document, it can be processed by using the method of the above embodiment after converting it into a corresponding Word document by using an existing tool. For EXCL documents, you can also use the ASpose component directly.
显然,本发明具有下述有益特点:Obviously, the present invention has the following beneficial features:
(1)使用该方法能将多种不同格式的文稿转换成为格式统一的标准化文档。帮助翻译工作的标准化管理,从而简化译员的翻译工作提升了翻译效率。(1) Using this method, it is possible to convert a plurality of documents of different formats into standardized documents of uniform format. Helping the standardized management of translation work, thus simplifying translators' translation work, improves translation efficiency.
(2)通过对同批次所有文档的整合分析,找出重复出现的句子。重复出现多次的句子无需翻译多次,只需要在第一次出现的时候进行翻译,其他位置该句子的译文将自动被填充。(2) Find out the recurring sentences by integrating the analysis of all the documents in the same batch. A sentence that repeats multiple times does not need to be translated multiple times. It only needs to be translated when it first appears. In other places, the translation of the sentence will be automatically filled.
(3)翻译过程中产生的新的语料信息保存至云服务。通过对语料的不断积累,逐渐提升翻译人员本身的翻译能力。通过语料复用的手段,提升翻译效率。(3) The new corpus information generated during the translation process is saved to the cloud service. Through the continuous accumulation of corpus, the translator's own translation ability is gradually improved. Improve translation efficiency through means of corpus reuse.
(4)能将需要翻译的稿件全部转换为统一格式的标准待译稿,方便了团队翻译的管理,降低了翻译项目经理的工作难度。 (4) It is possible to convert all the manuscripts that need to be translated into a standard format to be translated, which facilitates the management of team translation and reduces the difficulty of the translation project manager.
(5)抽取和还原逻辑的执行效率较高,基本不对翻译工作带来负面影响。(5) The extraction and restoration logic is highly efficient and does not have a negative impact on translation work.
(6)还原时能根据具体情况修改算法,达到格式保留的最终效果。(6) When restoring, the algorithm can be modified according to the specific situation to achieve the final effect of format retention.
最后所应说明的是,以上具体实施方式仅用以说明本发明的技术方案而非限制,尽管参照较佳实施例对本发明进行了详细说明,本领域的普通技术人员应当理解,可以对本发明的技术方案进行修改或者等同替换,而不脱离本发明技术方案的精神和范围,其均应涵盖在本发明的权利要求范围当中。 It should be understood that the above specific embodiments are only illustrative of the technical solutions of the present invention, and are not to be construed as limiting. The technical solutions are modified or equivalent, without departing from the spirit and scope of the invention, and are intended to be included within the scope of the appended claims.

Claims (8)

  1. 一种利于翻译工作的文件抽取和还原方法,其特征在于,包括以下步骤:A file extraction and restoration method suitable for translation work, characterized in that it comprises the following steps:
    1)利用Aspose动态链接库对文档处理的操作支持,将待翻译的文档对象拆解成以单句为最小单位的待翻译数据集合;1) Using the Aspose dynamic link library to support the operation of the document processing, and disassembling the document object to be translated into a data set to be translated with a single sentence as a minimum unit;
    2)建立一个译员处理文档,所述译员处理文档设有“原文”、“译文”和id三个字段,所述“原文”字段对应句子原文,“译文”字段对应句子译文;2) Establish a translator processing document, the translator processing document has three fields of "original", "translation" and id, the "original" field corresponds to the original text of the sentence, and the "translation" field corresponds to the sentence translation;
    3)将所述以单句为最小单位的待翻译数据集合中的每一个句子按顺序逐一复制到所述译员处理文档的“原文”字段,然后将待翻译数据集合中该句子的内容用一个具有唯一性的占位符号Guid替代,且相邻的占位符号Guid具有不同的字符格式;所述id字段的内容与所述不同的Guid之间具有一一映射的关系;3) copying each sentence in the set of data to be translated in a single sentence as a minimum unit one by one to the "original" field of the translator processing document, and then using the content of the sentence in the data set to be translated one with The unique placeholder Guid is replaced, and the adjacent placeholder Guid has a different character format; the content of the id field has a one-to-one mapping relationship with the different Guid;
    4)将所述译员处理文档下发到译员,所述译员在所述译员处理文档中逐个翻译“原文”字段的原文,填写到对应的“译文”字段,直到处理完成;4) delivering the translator processing document to the translator, the translator translating the original text of the "original" field one by one in the translator processing document, and filling in the corresponding "translation" field until the processing is completed;
    5)遍历所述待翻译数据集合和译员处理文档,根据不同Guid对应的不同id,找到该id对应的译文,覆盖写到所述待翻译数据集合中该对应Guid的位置;5) traversing the data set to be translated and the interpreter processing document, and finding a translation corresponding to the id according to different ids corresponding to different Guids, and overwriting the location of the corresponding Guid written in the data set to be translated;
    6)调用Aspose动态链接库,将所述待翻译数据集合还原生成原稿格式文档。6) Calling the Aspose dynamic link library to restore the data set to be translated to generate a document format document.
  2. 根据权利要求1所述的利于翻译工作的文件抽取和还原方法,其特征在于,所述将待翻译的文档对象拆解成以句为最小单位的待翻译数据集合,包括以下步骤:The method for extracting and restoring a file for facilitating translation work according to claim 1, wherein the disassembling the document object to be translated into a data set to be translated with a sentence as a minimum unit comprises the following steps:
    1-1调用Aspose组件;1-1 call the Aspose component;
    1-2遍历所述文档对象,得到全部段落对象,所述段落对象包含文档对象全部的文字信息,而不包括无需翻译的符号、图像或其它非文字信息;1-2 traversing the document object to obtain all paragraph objects, the paragraph object containing all the text information of the document object, and not including symbols, images or other non-text information without translation;
    1-3遍历每一个段落对象的子节点对象,从而得到若干个的字符集合对象Run;1-3 traversing the child node object of each paragraph object, thereby obtaining a plurality of character set objects Run;
    1-4遍历每一个Run对象,将全部Run对象拆分成一个个仅包含有一个完整的句子的Run对象,或者为仅包含有一个句子片段的Run对象;1-4 traversing each Run object, splitting all Run objects into a Run object containing only one complete sentence, or a Run object containing only one sentence fragment;
    1-5遍历每一个Run对象,将仅包含有句子片段的Run对象合并到其后续的仅包含有一个完整的句子的Run对象中。1-5 traverses each Run object, merging the Run object containing only the sentence fragment into its subsequent Run object containing only one complete sentence.
  3. 根据权利要求2所述的利于翻译工作的文件抽取和还原方法,其特征在于,所述将仅包含有一个句子片段的Run对象合并到后续Run对象,包括以下步骤:The file extraction and restoration method for translation work according to claim 2, wherein the merging of a Run object containing only one sentence segment into a subsequent Run object comprises the following steps:
    1-4-1将仅为一个句子片段的Run对象的字符内容取出,存放在临时存储单 元,然后在段落对象中删除该Run对象;1-4-1 will take out the character content of the Run object of only one sentence fragment and store it in the temporary storage list. Meta, then delete the Run object in the paragraph object;
    1-4-2检查下一个Run对象,如果该Run对象的字符内容仅为一个句子片段,则取出该Run对象的字符内容,添加到临时存储单元,然后在段落对象中删除该Run对象,继续检查下下一个Run对象;否则,取出临时存储单元存放字符内容,添加到该下一个Run对象的字符内容之前,然后清空所述临时存储单元;1-4-2 Check the next Run object. If the character content of the Run object is only a sentence fragment, extract the character content of the Run object, add it to the temporary storage unit, and then delete the Run object in the paragraph object, and continue. Check the next Run object; otherwise, take out the temporary storage unit to store the character content, add to the character content of the next Run object, and then empty the temporary storage unit;
    1-4-3如果该下一个Run对象的字符内容是以句子结束符作为结尾的,则将所述临时存储单元存放的字符内容取出,添加到该下一个Run对象的字符内容之前,然后清空所述临时存储单元。1-4-3 If the character content of the next Run object is terminated by the sentence terminator, the character content stored in the temporary storage unit is taken out, added to the character content of the next Run object, and then cleared. The temporary storage unit.
  4. 根据权利要求1所述的利于翻译工作的文件抽取和还原方法,其特征在于,还包括,建立一个字典对象,所述字典对象的key为原文,value为译文,原文-译文为一个键值对;在遍历所述译员处理文档时,将一个记录中对应的原文-译文,分别写入所述字典对象。The method for extracting and restoring a file for facilitating translation work according to claim 1, further comprising: creating a dictionary object, the key of the dictionary object is an original text, the value is a translation, and the original text-translation is a key value pair. When traversing the translator to process the document, the corresponding original text-translation in one record is respectively written into the dictionary object.
  5. 根据权利要求4所述的利于翻译工作的文件抽取和还原方法,其特征在于,在步骤5)中,如果一个id所在记录的译文栏为空,则在所述字典对象中,以该id所在记录的原文为key去查找是否有匹配的译文value,如果找到则以该译文填充译文栏;The method for extracting and restoring a file for facilitating translation work according to claim 4, wherein in step 5), if the translation field of the record of an id is empty, the id is located in the dictionary object. The original text of the record is key to find whether there is a matching translation value, and if found, the translation column is filled with the translation;
    如果在所述字典对象中,没有找到匹配的译文value,则该句为漏译,直接使用原文进行填充。If no matching translation value is found in the dictionary object, the sentence is missing and directly filled with the original text.
  6. 根据权利要求1所述的利于翻译工作的文件抽取和还原方法,其特征在于,在将所述译员处理文档下发到译员之前,遍历所述译员处理文档,将重复的句子标记出来,提醒译员不需要重复翻译。The file extraction and restoration method for translation work according to claim 1, wherein before the translator processing the document is sent to the translator, the translator is traversed to process the document, and the repeated sentence is marked to remind the translator. No need to repeat the translation.
  7. 根据权利要求1所述的利于翻译工作的文件抽取和还原方法,其特征在于,在将所述译员处理文档下发到译员之前,遍历所述译员处理文档,将原文中的句子与术语库中的术语进行自动匹配,如果匹配,则对该术语句子进行批注,使得翻译工作更加顺畅。The file extraction and restoration method for translation work according to claim 1, wherein before the translator processing the document is sent to the translator, the translator is traversed to process the document, and the sentence in the original text is in the terminology library. The term is automatically matched, and if it matches, the term is annotated to make the translation work smoother.
  8. 根据权利要求1所述的利于翻译工作的文件抽取和还原方法,其特征在于,在将所述译员处理文档下发到译员之前,遍历所述译员处理文档,将原文中的句子逐一与语料库中的语料比对匹配,如果匹配,则将语料库中的语料译文填写到该匹配句子对应的“译文”字段内。 The file extraction and restoration method for translation work according to claim 1, wherein before the translator processing the document is sent to the translator, the translator is traversed to process the document, and the sentences in the original text are successively and in the corpus. The corpus matches the matches. If they match, the corpus translation in the corpus is filled into the "translation" field corresponding to the matching sentence.
PCT/CN2015/098668 2015-06-25 2015-12-24 File extraction and restoration method favorable for translation work WO2016206336A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN2015103576722 2015-06-25
CN201510357672.2A CN104933041B (en) 2015-06-25 2015-06-25 A kind of file beneficial to translation is extracted and restoring method

Publications (1)

Publication Number Publication Date
WO2016206336A1 true WO2016206336A1 (en) 2016-12-29

Family

ID=54120210

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2015/098668 WO2016206336A1 (en) 2015-06-25 2015-12-24 File extraction and restoration method favorable for translation work

Country Status (2)

Country Link
CN (1) CN104933041B (en)
WO (1) WO2016206336A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109617974A (en) * 2018-12-21 2019-04-12 珠海金山办公软件有限公司 A kind of request processing method, device and server
CN110555196A (en) * 2018-05-30 2019-12-10 北京百度网讯科技有限公司 method, device, equipment and storage medium for automatically generating article
CN110688863A (en) * 2019-09-25 2020-01-14 六维联合信息科技(北京)有限公司 Document translation system and document translation method
CN112766003A (en) * 2021-01-20 2021-05-07 语联网(武汉)信息技术有限公司 Document auxiliary translation method and device

Families Citing this family (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104933041B (en) * 2015-06-25 2017-09-01 武汉传神信息技术有限公司 A kind of file beneficial to translation is extracted and restoring method
CN106919558B (en) * 2015-12-24 2020-12-01 姚珍强 Translation method and translation device based on natural conversation mode for mobile equipment
CN105808528B (en) * 2016-03-04 2019-01-25 张广睿 A kind of processing method of document text
CN105760368B (en) * 2016-03-11 2019-02-12 张广睿 A kind of deep treatment method of document text
CN105677643A (en) * 2016-03-14 2016-06-15 张广睿 Translation method combining manpower and machine
CN105975451B (en) * 2016-05-27 2019-04-23 成都优译信息技术有限公司 The processing system and its processing method of DWG formatted file translation data
CN106055529B (en) * 2016-05-27 2019-04-23 成都优译信息技术有限公司 The resolution system and its analytic method of text data to be translated in DWG formatted file
CN105975461B (en) * 2016-05-27 2019-04-23 成都优译信息技术有限公司 Increase the method for translation newly in DWG formatted file
CN106021197B (en) * 2016-05-27 2019-04-23 成都优译信息技术有限公司 The translation system and interpretation method of DWG formatted file
CN106021242B (en) * 2016-05-27 2019-04-23 成都优译信息技术有限公司 DWG format drawing interpretation write back data system and its write-back method
CN107590140B (en) * 2017-10-17 2020-09-25 语联网(武汉)信息技术有限公司 Document missing item processing method
CN107885735B (en) * 2017-11-21 2021-05-04 语联网(武汉)信息技术有限公司 Format-independent document translation method and system
CN108563645B (en) * 2018-04-24 2022-03-22 成都智信电子技术有限公司 Metadata translation method and device of HIS (hardware-in-the-system)
CN109446531B (en) * 2018-09-06 2023-05-05 语联网(武汉)信息技术有限公司 Method and device for detecting translation progress and electronic equipment
CN109783826B (en) * 2019-01-15 2023-11-21 四川译讯信息科技有限公司 Automatic document translation method
CN111143074B (en) * 2019-12-30 2024-04-09 文思海辉智科科技有限公司 Method and device for distributing translation files
CN111144070B (en) * 2019-12-31 2023-08-01 北京迈迪培尔信息技术有限公司 Document analysis translation method and device
CN111291575B (en) * 2020-02-28 2023-04-18 北京字节跳动网络技术有限公司 Text processing method and device, electronic equipment and storage medium
CN112052648B (en) * 2020-09-02 2021-11-16 文思海辉智科科技有限公司 String translation method and device, electronic equipment and storage medium
CN113705158A (en) * 2021-09-26 2021-11-26 上海一者信息科技有限公司 Method for intelligently restoring original text style in document translation

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5848386A (en) * 1996-05-28 1998-12-08 Ricoh Company, Ltd. Method and system for translating documents using different translation resources for different portions of the documents
CN102982027A (en) * 2011-09-02 2013-03-20 北大方正集团有限公司 Method and device for abstracting contents in document
CN104331399A (en) * 2014-07-25 2015-02-04 一朵云(北京)科技有限公司 Dictionary tree translation method
CN104933041A (en) * 2015-06-25 2015-09-23 武汉传神信息技术有限公司 File extraction and reduction method favorable for translation work

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5848386A (en) * 1996-05-28 1998-12-08 Ricoh Company, Ltd. Method and system for translating documents using different translation resources for different portions of the documents
CN102982027A (en) * 2011-09-02 2013-03-20 北大方正集团有限公司 Method and device for abstracting contents in document
CN104331399A (en) * 2014-07-25 2015-02-04 一朵云(北京)科技有限公司 Dictionary tree translation method
CN104933041A (en) * 2015-06-25 2015-09-23 武汉传神信息技术有限公司 File extraction and reduction method favorable for translation work

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110555196A (en) * 2018-05-30 2019-12-10 北京百度网讯科技有限公司 method, device, equipment and storage medium for automatically generating article
CN110555196B (en) * 2018-05-30 2023-07-18 北京百度网讯科技有限公司 Method, apparatus, device and storage medium for automatically generating article
CN109617974A (en) * 2018-12-21 2019-04-12 珠海金山办公软件有限公司 A kind of request processing method, device and server
CN110688863A (en) * 2019-09-25 2020-01-14 六维联合信息科技(北京)有限公司 Document translation system and document translation method
CN110688863B (en) * 2019-09-25 2023-04-07 六维联合信息科技(北京)有限公司 Document translation system and document translation method
CN112766003A (en) * 2021-01-20 2021-05-07 语联网(武汉)信息技术有限公司 Document auxiliary translation method and device

Also Published As

Publication number Publication date
CN104933041A (en) 2015-09-23
CN104933041B (en) 2017-09-01

Similar Documents

Publication Publication Date Title
WO2016206336A1 (en) File extraction and restoration method favorable for translation work
CN108595389B (en) Method for converting Word document into txt plain text document
CN108415887A (en) A kind of method that pdf document is converted to OFD files
CN104346319B (en) Method and system for inspecting document style
CN101558405B (en) Migration apparatus which convert database of mainframe system into database of open system and method for thereof
CN109582647B (en) Unstructured evidence file oriented analysis method and system
CN111309313A (en) Method for quickly generating HTML (hypertext markup language) and storing form data
CN105138575A (en) Analysis method and device of voice text string
CN110516203B (en) Dispute focus analysis method, device, electronic equipment and computer-readable medium
CN109445794B (en) Page construction method and device
CN104750472A (en) Resource bundle management method and device of terminal application
CN111068336A (en) Game translation version generation method and device, electronic equipment and storage medium
CN111176650A (en) Parser generation method, search method, server, and storage medium
CN112527291A (en) Webpage generation method and device, electronic equipment and storage medium
CN112766000A (en) Machine translation method and system based on pre-training model
CN113867694B (en) Method and system for intelligently generating front-end code
US20180032544A1 (en) Distributed processing management method and distributed processing management apparatus
CN109947711B (en) Automatic multi-language file management method in IOS project development process
Clausner et al. Efficient ocr training data generation with aletheia
CN106815181B (en) Method and device for converting Indesign typesetted ind files into Office files
CN110889266A (en) Conference record integration method and device
US20130024765A1 (en) Processing rich text data for storing as legacy data records in a data storage system
Hutchins Testing software tools of potential interest for digital preservation activities at the national library of australia
CN104331399A (en) Dictionary tree translation method
CN102629244B (en) Multi-language work card generating system and method

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15896219

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15896219

Country of ref document: EP

Kind code of ref document: A1