Method for extracting translation unit table in machine translation
A technology of machine translation and cell table, applied in the field of hierarchical phrase table and lexical ordering model, and distributed phrase extraction, which can solve the problems of high program time consumption and no mention of how to implement it
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0139] This embodiment extracts the phrase table operation as follows:
[0140] 11. Enter the bilingual alignment corpus and the corresponding word alignment file, set the maximum length of the source language phrase to 3, and the maximum length of the target language phrase to 5, and set the extracted phrase pairs to have empty phrases. For each bilingual pair in the bilingual alignment corpus To align sentence pairs, according to the word alignment information in the word alignment file, first extract all the aligned phrase pairs and record their word alignment information and the number of occurrences; then merge the information of the same aligned phrase pairs and add the number of occurrences, And save the word alignment information with the most occurrences; the combined result has a total of 44018003 phrase pairs.
[0141] 12. Take the result of step 1 as input, use the Good-Turing method for smoothing, count (c, nc) pairs, and output the result to a file, c and n in th...
Embodiment 2
[0144] The present invention extracts hierarchical phrase table and operates as follows:
[0145] 21. Enter the bilingual alignment corpus and the corresponding word alignment file, set the maximum length of the source language phrase to 3, and the maximum length of the target language phrase to 5, and set the extracted phrase pair to have empty phrases. For each bilingual pair in the bilingual alignment corpus Align the sentence pairs, according to the word alignment information in the word alignment file, first extract all the alignment level phrase pairs and record the corresponding word alignment information and the number of occurrences; then merge the information of the same level phrase pairs, and add the number of occurrences , and save the word alignment information with the most occurrences; the combined result has a total of 430252258 pairs of hierarchical phrases.
[0146] 22. Take the result of step 1 as input, use the Good-Turing method for smoothing, and count (...
Embodiment 3
[0149] The present invention extracts the lexical ordering model and operates as follows:
[0150] 31. For the input bilingual alignment corpus and the corresponding word alignment file, set the maximum length of the source language phrase and the target language phrase to 7, and set the extracted phrase pair to have no empty phrases. For each pair of bilingual alignment sentences in the bilingual alignment corpus Yes, according to the word alignment information in the word alignment file, extract all aligned phrase pairs and corresponding ordering rules and output them to the file. There are 228,514,143 unmerged phrase pairs in the result file.
[0151] 32. According to the result of step 1, count the total number of appearances of each ordering rule, among which the number of occurrences of the mono rule in the upper direction is 150367615, the number of occurrences of the swap rule is 14918685, the number of occurrences of the discontinuous rule is 63227843; the number of oc...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 