Method for improving bilingual corpus, device for improving bilingual corpus, machine translation method and machine translation device
A bilingual corpus and machine translation technology, applied in the field of natural language processing, can solve problems such as no improvement in word alignment quality, long sentences without considering context information, and complicated bilingual corpus segmentation methods, so as to improve word alignment quality and avoid word alignment. Alignment error, easy to expand the effect
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment Construction
[0080] 下面就结合附图对本发明的各个优选实施方式进行详细的说明。
[0081] Methods for Improving Bilingual Corpora
[0082] 本实施方式提供一种用于改进双语语料库的方法,其中,上述双语语料库包括多个第一语种和第二语种的句对以及每个句对之间的词对齐信息,上述方法包括以下步骤:在给定的句对的词对齐信息中提取切分候选;计算上述切分候选的切分置信度;对上述切分置信度和预定的阈值进行比较;以及在上述切分置信度大于等于上述阈值的情况下,在上述切分候选处对上述给定的句对进行切分。
[0083] Refer below figure 1 Describe in detail. figure 1 是根据本实施方式的用于改进双语语料库的方法的流程图。
[0084] Such as figure 1 所示,首先,在步骤S101,在需要进行改进的对齐双语语料库10中选择一个双语句对。本实施方式中,对齐双语语料库10包括多个第一语种(源语言)和第二语种(目标语言)的句对以及每个句对之间由自动词对齐工具给出的词对齐信息。对齐双语语料库10是利用本领域的技术人员公知的任何词对齐工具,例如GIZA++工具对双语语料进行对齐而获得的词对齐结果。双语语料库是本领域的技术人员公知的用于SMT系统的任何双语语料库。本实施方式对于对齐双语语料库10没有任何限制。
[0085] 接着,在步骤S105,对于所选的双语句对,在其词对齐信息中提取切分候选。具体过程如下。
[0086] 假设双语句对中的源语言句子为: 目标语言句子为: m和l为自然数。
[0087] 由GIZA++得到的双向词对齐结果:
[0088] a j =j ,t j >,s j ∈[0,1,...,m],t j ∈[0,1,...,1]
[0089] 在步骤S105中,提取可能的切分候选a j =j , t j >. 在本实施方式中,切分候选优选满足如下条件:
[0090] (1) ,为一一对齐,
[0091] (2) ,为具有断句功能的词和 / 或符号。
[0092] 具有断句功能的符号优选为标点符号,标点符号优选但不限于:逗号、句号、分号、问号、感叹号等。
[0...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com