Unlock instant, AI-driven research and patent intelligence for your innovation.

Multi-type word information guidance-based Chinese and cross-language abstract generation method

A cross-language, multi-type technology, applied in neural learning methods, natural language translation, natural language data processing, etc., can solve problems such as lack of important information, difficulties in bilingual semantic alignment, etc., and achieve accurate results

Pending Publication Date: 2022-06-21
KUNMING UNIV OF SCI & TECH
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The invention provides a Chinese-Vietnamese cross-language summary generation method guided by multi-type word information, which is used to solve the problems of lack of important information in the generation of Chinese-Vietnamese cross-language summary and the difficulty of bilingual semantic alignment in the generation of cross-language summary

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Multi-type word information guidance-based Chinese and cross-language abstract generation method
  • Multi-type word information guidance-based Chinese and cross-language abstract generation method
  • Multi-type word information guidance-based Chinese and cross-language abstract generation method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0055] Example 1: as Figure 1-Figure 2 As shown, the method for generating Chinese-Spanish cross-language abstracts guided by multi-type word information, the specific steps of the method for generating Chinese-Spanish cross-language abstracts guided by multi-type word information are as follows:

[0056] Step1. Corpus data set construction: obtain Chinese-Vietnamese cross-language abstract data set and Chinese-Vietnamese translation data set;

[0057] Step2. Preprocessing of Chinese-Vietnamese corpus: word segmentation for Chinese and Vietnamese data;

[0058] Step3. Obtain keywords in Chinese text: For Chinese text, use greedy algorithm to extract important sentences with the largest ROUGE value, and then use TextRank algorithm to extract keywords;

[0059] Step4. Build a Chinese-Vietnamese bilingual probability dictionary: use the constructed Chinese-Vietnamese translation data set and use the fast-align method to build a Chinese-Vietnamese bilingual probability dictionar...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a Chinese cross-language abstract generation method based on multi-type word information guidance, and belongs to the technical field of natural language processing. The method comprises the following steps: constructing a corpus data set; performing word segmentation preprocessing on the Chinese-Vietnamese corpus; obtaining keywords in the Chinese text; constructing a bilingual probability dictionary; guiding generation of the abstract by utilizing the multi-class word information; according to the method, through an additional keyword encoder, important information of a source text is encoded by utilizing explicit keyword information guidance; then, an attention mechanism is introduced, word alignment information in the bilingual probability dictionary is further utilized, translation probability distribution of text copy words with key information sources is dynamically guided, and bilingual alignment of the key information is achieved; and finally, on the basis of a pointer-generative network, connecting the translation probability distribution and the neural probability distribution through a gating mechanism to realize generation of the Vietnamese abstract. According to the method, a good effect is achieved in a Chinese-cross language abstract task.

Description

technical field [0001] The invention relates to a method for generating Chinese-Spanish cross-language abstracts guided by multi-type word information, and belongs to the technical field of natural language processing. Background technique [0002] Cross-language summarization aims to distill important information from Chinese texts to generate summaries in another language. The Chinese-Vietnamese cross-language summarization task is used to help Chinese or Vietnamese users obtain each other's text information efficiently and accurately, which is of great significance for promoting information and cultural exchanges between the two countries. Most of the current cross-language summarization tasks rely on machine translation, and the translation effect of low-resource languages ​​such as Vietnamese is not good. In the face of data scarcity, the trained model may generate summary information that does not conform to the original facts and bilingual semantics. Align difficult ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/289G06F40/258G06F40/242G06F40/211G06F40/58G06N3/04G06N3/08
CPCG06F40/289G06F40/242G06F40/258G06F40/58G06F40/211G06N3/04G06N3/08Y02D10/00
Inventor 高盛祥贾伟强张勇丙
Owner KUNMING UNIV OF SCI & TECH