Old-Chinese bilingual corpus construction method and device with Thai language as pivot

A bilingual corpus and construction method technology, applied in the field of natural language processing, can solve problems such as difficulty in obtaining parallel resources of old-Chinese bilinguals, resource scarcity, etc.
CN110717341AActive Publication Date: 2020-01-21KUNMING UNIV OF SCI & TECH

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
KUNMING UNIV OF SCI & TECH
Publication Date
2020-01-21

Smart Images

  • Figure 1
    Figure 1
  • Figure 2
    Figure 2
  • Figure 3
    Figure 3
Patent Text Reader

Abstract

The invention relates to an old-Chinese bilingual corpus construction method and device taking Thai language as a pivot, and belongs to the field of natural language processing. The method comprises the steps of firstly performing Thai word segmentation processing on Chinese-Thai parallel corpus data; constructing a Lao-Thai bilingual dictionary, and translating Thai sentences into Lao sentence sequences word by word by using the Lao-Thai bilingual dictionary to obtain candidate Lao-Thai parallel sentence pairs; constructing a two-way LSTM-based Lao language-Thai language parallel sentence pair classification model, and classifying the candidate Lao language-Thai language parallel sentence pairs to obtain Lao language-Thai language bilingual parallel sentence pairs; using the Thai languageas a pivot language to match the Lao language and the Chinese language, and a Lao language-Chinese bilingual parallel corpus is constructed. According to the old-Chinese bilingual parallel corpus construction device taking Thai language as pivot language, the problem of scarcity of old language-Chinese corpus is solved, and the old-Chinese bilingual parallel corpus construction device has certaintheoretical significance and practical application value for construction of the old-Chinese bilingual corpus.
Need to check novelty before this filing date? Find Prior Art

Description

technical field

[0001] The invention relates to a method and device for constructing an old-Chinese bilingual corpus with Thai as a pivot, and belongs to the technical field of natural language processing. Background technique

[0002] Corpus construction is the premise of natural language processing research. Lao-Chinese bilingual corpus is an important data resource for Chinese-Lao machine translation and cross-language retrieval. Lao language is a language with scarce resources among Southeast Asian languages. Lao-Chinese bilingual parallel Resources are relatively scarce, and it is difficult to directly obtain parallel bilingual resources of Old-Chinese from the Internet.

[0003] Both Laotian and Thai belong to the Zhuang-Dai branch of the Zhuang-Dong language family of the Sino-Tibetan language family. The basic vocabulary is almost the same or similar, and there is also a great similarity in the syntactic structure. The Chinese-Thai parallel corpus is relatively easy ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More