Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Unsupervised multi-model fusion extraction type text abstracting method

A multi-model, extractive technology, applied in the field of information extraction, can solve problems such as inability to take into account the semantic information of sentences, inability to accurately and comprehensively describe the content of the article, and information redundancy.

Pending Publication Date: 2020-10-02
NANJING SILICON INTELLIGENCE TECH CO LTD
View PDF5 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] The existing extractive text summarization technology cannot consider the semantic information of the sentence, the extracted result is too single, the information is redundant, some important information is lost, and the extracted result cannot accurately and comprehensively describe the content of the article

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Unsupervised multi-model fusion extraction type text abstracting method
  • Unsupervised multi-model fusion extraction type text abstracting method
  • Unsupervised multi-model fusion extraction type text abstracting method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0078] The present invention will be described in further detail below in conjunction with the accompanying drawings.

[0079] The present invention provides an unsupervised multi-model fusion extractive text summarization method, such as figure 1 shown, including the following steps:

[0080] S1. When extracting the abstract, the text preprocessing of the document to be processed must first be performed. The specific method is: divide the document to be processed into sentences first, and number each sentence in sequence; then perform word segmentation processing on each sentence, English can be used NLTK tool, Chinese can use jieba tool; remove stop words and invalid symbols, stop words include some modal particles, punctuation marks, articles, function words and other words that have no practical meaning or basically have no effect on sentence meaning.

[0081] S2. Train and optimize the centrality text summary model in advance, and calculate the first batch of summaries s...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to the field of information extraction. The invention discloses an unsupervised multi-model fusion extraction type text abstracting method. An existing extraction type text abstract technology is solved. In order to solve the problems that semantic information of sentences cannot be considered and extracted results cannot accurately and comprehensively describe article contents, the technical scheme of the method is as follows: a centrality text abstract model is trained and optimized in advance, and after optimization, a preprocessed to-be-processed document is calculated to obtain a first batch of abstracts summary1; the preprocessed to-be-processed document is calculated by using a semantic similarity capture model to obtain a second batch of abstract summaty2; thefirst batch of abstracts summary1 and the second batch of abstracts summary2 are fused to obtain a candidate abstract middle_summary2; the MMR algorithm is used to calculate the candidate abstract middle_summery so as to obtain a final abstract final_summery; and semantic understanding and analysis are carried out on text content in a multi-model fusion mode, position information of sentences isfully considered, the importance degree of each sentence can be accurately calculated, and the accuracy, flexibility and diversity of abstract results are improved.

Description

technical field [0001] The invention relates to the field of information extraction, more specifically, it relates to an unsupervised multi-model fusion extractive text summarization method. Background technique [0002] As people's pace of life accelerates, people's patience for text reading also decreases. When people need to read a long piece of news or lengthy academic papers, they often lose their reading patience because the text is too long. Therefore, in order to speed up Reading speed. At present, technologies for intelligently extracting important information from articles have appeared on the market, so that people can quickly understand the key information in the article, save reading time, and improve readers' reading efficiency. [0003] The existing extractive text summarization technology cannot consider the semantic information of the sentence, the extracted result is too single, the information is redundant, some important information is lost, and the extra...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/34G06F40/30G06K9/62
CPCG06F16/345G06F40/30G06F18/22G06F18/214G06F18/25Y02D10/00
Inventor 司马华鹏靳超超姚奥
Owner NANJING SILICON INTELLIGENCE TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products