Chinese-cross language abstract method fusing word granularity probability mapping information

A technology for mapping information and cross-language, applied in natural language translation, neural learning methods, natural language data processing, etc., can solve the problems of inaccurate information expression, incomplete summary result information expression, and inability to learn semantic information well.

Pending Publication Date: 2021-12-03
KUNMING UNIV OF SCI & TECH
View PDF0 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The present invention provides a Chinese-Vietnamese cross-language summarization method that integrates word granularity probability mapping information to solve the problem that the Chinese-Vietname

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Chinese-cross language abstract method fusing word granularity probability mapping information
  • Chinese-cross language abstract method fusing word granularity probability mapping information
  • Chinese-cross language abstract method fusing word granularity probability mapping information

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0062] Embodiment 1: as Figure 1-4 As shown, the Chinese-Vietnamese cross-language summarization method of fusion word granularity probability mapping information, the specific steps of the Chinese-Vietnamese cross-language summarization method of fusion word granularity probability mapping information are as follows:

[0063] Step1. Corpus collection: Obtain Chinese-Vietnamese article abstract data pairs, Chinese-English article abstract data pairs;

[0064] As a further solution of the present invention, said Step1 includes:

[0065] Crawl Chinese-based article abstract datasets from the Internet, and use Google Translate to translate them into Vietnamese and English, obtain Vietnamese-based article abstract datasets, and English-based article abstract datasets, and then obtain 300,000 Chinese 100,000 Chinese-Vietnamese article abstract data pairs and 100,000 Chinese-English article abstract data pairs were obtained through manual screening and alignment. It is divided in...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a Chinese cross-language abstract method fusing word granularity probability mapping information, and belongs to the technical field of natural language processing. The method comprises the following steps: collecting corpora; performing word segmentation preprocessing on the collected corpus; using a fast-align tool and a statistical thought to obtain a Chinese and Chinese probability mapping pair; adopting a coding and decoding attention mechanism to obtain keywords based on the abstract of the Chinese article; constructing a probability mapping mechanism; fusing word-level probability mapping information; According to the method, Chinese-cross word granularity information and chapter-level texts thereof are represented; secondly, based on an attention mechanism, carrying out joint characterization on information of word granularity and a text-level text; and finally, by fusing the alignment information of the word granularity to the abstract of the target language, the accuracy of the abstract is improved. Experiments are carried out on a Chinese-cross language abstract data set, and the experiments prove the effectiveness and superiority of the method.

Description

technical field [0001] The invention relates to a Chinese-Vietnamese cross-language summarization method that integrates word granularity probability mapping information, and belongs to the technical field of natural language processing. Background technique [0002] Text summarization aims to generate short summaries from a given long text. The current mainstream summarization task only targets one language, while cross-language summarization aims to generate a summarization in another language for a given source language article. At present, although chapter-level annotation data for Chinese-Vietnamese cross-language summaries is scarce, word-level alignment data are relatively abundant. And because the word order of Chinese and Vietnamese is different, it is difficult to align their semantics, and a large amount of labeled data is required for training. In a low-resource language environment, due to the scarcity of parallel data, the cross-language summarization model c...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F40/58G06F40/30G06F40/289G06F40/242G06N3/04G06N3/08
CPCG06F40/58G06F40/30G06F40/242G06F40/289G06N3/084G06N3/048G06N3/044G06N3/045
Inventor 张亚飞李笑萌郭军军高盛祥余正涛
Owner KUNMING UNIV OF SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products