Semi-supervised adversarial learning cross-language abstract generation method based on word alignment

A semi-supervised, word-aligned technology, applied in natural language translation, natural language data processing, semantic analysis, etc., it can solve the problems of poor translation effect and translation, and achieve optimization effect, good processing, and cross-language abstract generation. the effect of the task

Active Publication Date: 2021-03-23
KUNMING UNIV OF SCI & TECH
View PDF8 Cites 9 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] The present invention provides a semi-supervised adversarial learning method for generating cross-language summaries based on word alignment, which is used to solve the problems that texts in the same language are difficult to represent in the same feature space, and how to use text representations in the same space to perform cross-language summarization tasks. And solved the problem that translation must be used to achieve cross-language summarization, but the translation effect is not good

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Semi-supervised adversarial learning cross-language abstract generation method based on word alignment
  • Semi-supervised adversarial learning cross-language abstract generation method based on word alignment
  • Semi-supervised adversarial learning cross-language abstract generation method based on word alignment

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0044] Example 1: Such as Figure 1-2 As shown, the term-based semi-supervision confrontation learning cross-language abstract generation method, the method includes:

[0045] Step1, collecting news texts for training Han crosswords, and gets existing Hanfang words; from LCSTS data extracted from Sina Weibo, this corpus is mainly sorted up from Sina Weibo. Each tang is composed of two parts: short text content and corresponding reference summary. Vietnamese texts will be obtained by gaining pseudorandums by gaining the LCSTS corpus that will be obtained by the Google Translation Tool. Among them, there are about 200,000 pseudorated texts, tests and about 1000 pseudorabial texts. In addition, there is also the help of Internet reptile technology from China News Network, Xinhuanet, Sina News and other domestic news websites, and Vietnam Daily News, Vietnamese Economic Daily, Vietnam News Site and other Vietnamese news websites collect news, collect data contains news headlines, text...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a semi-supervised adversarial learning cross-language abstract generation method based on word alignment, and belongs to the technical field of natural language processing. The method comprises the following steps: collecting news texts generated for training a Chinese-Vietnamese cross-language abstract , and obtaining existing Chinese-Vietnamese bilingual word vectors; pre-training a monolingual abstract model and semi-supervised adversarial learning by utilizing the Han-Vietnamese news text and the Han-Vietnamese bilingual word vectors respectively; utilizing a Bertencoder to respectively carry out vector representation on input Chinese-Vietnamese pseudo-parallel corpora; carrying out semi-supervised adversarial learning by combining the vectors obtained by theencoder with the Chinese-Vietnamese bilingual seed dictionary to obtain vectors mapped to the same semantic space; and taking the context text vector and the reference abstract which are mapped in thesame semantic space as input of a transformer decoder, and carrying out decoding to output a target language abstract. According to the method, the cross-language abstract generation task is realized, and the cross-language abstract effect is optimized.

Description

Technical field [0001] The present invention relates to the term-based semi-supervision confrontation learning cross-language abstract generation method, which belongs to the technical field of natural language processing. Background technique [0002] Cross-language summary generation is a hot problem of current natural language processing research. The problem of common concern in China and Vietnam, the relevant news reports also increased, using cross-language abstract approach to obtain the text summary information of Vietnamese news, to the timely understanding of the two countries, to promote the common development of the two countries It is of great significance. Currently, translation technology for small languages ​​is not yet mature, and different language text is difficult to represent the abstracts of cross-language news text under the same feature space. Therefore, it is of great significance to use artificial intelligent technology to automatically generate the abst...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/242G06F40/30G06F40/44G06F16/34
CPCG06F40/242G06F40/30G06F16/345G06F40/44
Inventor 余正涛张莹黄于欣高盛祥郭军军相艳
Owner KUNMING UNIV OF SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products