Weak supervision Chinese-Vietnamese bilingual dictionary construction method based on English pivot

A bilingual dictionary and construction method technology, applied in the field of cross-language natural language processing, can solve the problems of scarcity, poorness, and difficulty in labeling the effect of existing methods, so as to solve the problem of poor dictionary construction effect, improve accuracy and improve The effect of accuracy

Active Publication Date: 2020-06-19
KUNMING UNIV OF SCI & TECH
View PDF6 Cites 13 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The invention provides a weakly supervised Chinese-Vietnamese bilingual dictionary construction method based on English pivots, which is used to solve the problem that parallel corpus

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Weak supervision Chinese-Vietnamese bilingual dictionary construction method based on English pivot
  • Weak supervision Chinese-Vietnamese bilingual dictionary construction method based on English pivot
  • Weak supervision Chinese-Vietnamese bilingual dictionary construction method based on English pivot

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0040] Embodiment 1: as Figure 1-5 As shown, a weakly supervised Chinese-Vietnamese bilingual dictionary construction method based on English pivot, figure 1 Provides a flowchart of a weakly supervised Chinese-Vietnamese dictionary construction method based on English pivots. This method mainly includes the following steps:

[0041] Step A: Collect monolingual corpus of Chinese, English and Vietnamese and preprocess the corpus. Step B: The method based on the seed dictionary aligns the Chinese and Vietnamese word vectors to the English word vector shared space respectively. Step C: Learning the mapping relationship between Chinese and Vietnamese word vectors through an adversarial network in the shared space of English word vectors. Step D: Use different extraction strategies to extract Chinese-Vietnamese dictionaries and calculate the accuracy rate.

[0042] Further, in step A, the following steps are included: step A01, different web crawler programs will be written for ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a weak supervision Chinese-Vietnamese bilingual dictionary construction method based on an English pivot, and belongs to the technical field of natural language processing. The method comprises the following steps of: respectively collecting monolingual corpora of Chinese, English and Vietnamese and preprocessing the corpora; aligning the Chinese word vectors to an Englishword vector sharing space based on a seed dictionary method; learning a mapping relationship between the Chinese word vectors through an adversarial network in the English word vector sharing space;and different extraction strategies are adopted to extract the Han-cross dictionary. According to the method, the accuracy of automatically constructing the Han-crossing dictionary is greatly improved. The problems that in an existing Chinese-Vietnamese bilingual dictionary construction method, parallel corpora, seed dictionaries and the like are very scarce and difficult to label, and an existingmethod is poor in construction effect are solved.

Description

technical field [0001] The invention relates to a weakly supervised Chinese-Vietnamese bilingual dictionary construction method based on an English pivot, and belongs to the technical field of cross-language natural language processing. Background technique [0002] The exchanges between China and countries along the route are becoming more and more frequent, and the issue of language translation in cross-language communication has attracted much attention. As one of the countries along the route, the exchanges between Vietnam and my country are getting closer. As a basic resource in the field of cross-language natural language processing, bilingual dictionaries have extremely important research value. The quality of bilingual dictionaries has a great impact on NLP tasks such as information retrieval, machine translation, and cross-language annotation projection. However, manual construction of large-scale Chinese-Vietnamese bilingual dictionaries requires the participatio...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F40/42G06F40/58G06F40/284G06N3/08G06F16/951
CPCG06F16/951G06N3/088
Inventor 余正涛陈亚豪张亚飞文永华朱俊国高盛祥
Owner KUNMING UNIV OF SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products