Corpus word segmentation preprocessing method for machine translation
A machine translation and preprocessing technology, which is applied in natural language data processing, neural learning methods, natural language translation, etc., can solve the problem of word waste, placeholder segmentation granularity, etc., and achieve the effect of improving the accuracy of word segmentation and solving the waste of occupation
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0035] like figure 1 Show, this embodiment provides a pre -processing method of a machine translation of a machine translation, including the following steps:
[0036] Step S1: Data cleaning the original corpus according to the language rules;
[0037] As a preferred plan for this embodiment, the data cleaning described in the steps S1 includes:
[0038] Remove the empty line; remove sentences that are not aligned at the end of the sentence in the statement; remove the HTML markup language; remove the rigid character; remove the sentence containing a third -party language; remove the garbled code; the algorithm processing of the sentences to remove the alignment algorithm to remove the poor alignment effect is poor Sentences; take the original text and translation as the key; convert Chinese traditional Chinese to simplified.
[0039] Step S2: Standardized symbolization of the Classes after cleaning;
[0040] In this embodiment, the symbolic standardization process described in th...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 
