Abstract generation method based on single long text

A long text and abstract technology, applied in the field of abstract generation based on a single long text, to achieve the effect of high difficulty and time-consuming

Pending Publication Date: 2020-10-30
黑龙江阳光惠远知识产权运营有限公司
View PDF4 Cites 13 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] Aiming at the problem of massive data encountered in the existing patent retrieval, the present invention develops a method for generating abstracts suitable for a single long text. Automatic summary generation of text, which facilitates further screening and processing of massive text

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Abstract generation method based on single long text
  • Abstract generation method based on single long text
  • Abstract generation method based on single long text

Examples

Experimental program
Comparison scheme
Effect test

specific Embodiment 1

[0053] Such as figure 1 As shown, the present invention provides a method for generating abstracts based on a single long text, comprising the following steps:

[0054] A method for generating an abstract based on a single long text, comprising the following steps:

[0055] Step 1: For the single long text to be processed, construct the feature vector of the text sentence through the Bert algorithm, and determine the cosine similarity between the sentences;

[0056] The step 1 is specifically:

[0057] Step 1.1: Based on the single long text to be processed, the sent_tokenize() function in the punkt separator in the NLTK library is used to separate the sentences of the English text. The sent_tokenize() function in the punkt separator uses a language-independent The unsupervised method detects sentence boundaries, enabling it to accurately handle dotted words;

[0058] Perform word segmentation, case conversion, removal of stop words, numbers and punctuation processing on se...

specific Embodiment 2

[0093] The method design process that the present invention proposes is as figure 1 As shown, the method is based on the design of the classic TextRank algorithm, as follows:

[0094] Step 1: For a single long text to be processed, express the sentence feature vector through the Bert algorithm to calculate the cosine similarity;

[0095] The method of the present invention is based on the realization of the TextRank algorithm. In the classic TextRank algorithm, the feature representation of the sentence is to measure the similarity between two sentences based on the method of content overlap. This method only considers the difference between the words. overlap, while ignoring the semantic information in the sentence. In order to consider the semantic information of sentences, later scholars began to consider using word embedding methods such as Word2Vector model or GloVe model to express word vectors, and express sentence vectors by means of weighted average of word vectors. ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an abstract generation method based on a single long text. The invention relates to the technical field of abstract generation of a single long text, and the method comprises the steps: constructing a feature vector of a text sentence through a Bert algorithm, and determining the cosine similarity between sentences; determining weight scores of the text sentences, and performing weight correction on the text sentences according to the positions, lengths and title similarity characteristics of the text sentences; carrying out redundancy processing on the candidate abstract sentences by adopting an MMR algorithm; and extracting the abstract of the patent text according to the candidate abstract sentences subjected to redundancy processing. Various indexes of the abstract obtained by adopting the method disclosed by the invention are all greater than those of other existing similar methods. The abstract generation method is applied to the technical field of patentretrieval, and the work efficiency and accuracy of patent noise reduction can be effectively improved.

Description

technical field [0001] The invention relates to the technical field of patent abstract generation, and relates to an abstract generation method based on a single long text. Background technique [0002] With the advent of the information age, people are becoming more and more dependent on the Internet to obtain the information they need, but the information on it is growing explosively. How to effectively screen out the useful information from the massive amount of information has become the key In the field of single long text, it also faces similar problems. As the most effective carrier of technical information, patent documents include more than 90% of the latest technical information in the world, which is 5-6 years earlier than the information provided by general technical publications, and 70%-80% of inventions are only disclosed through patent documents. It is not found in other scientific and technological documents. Compared with other forms of documents, patents ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/34G06F40/279G06Q50/18
CPCG06Q50/184G06F16/345G06F40/279
Inventor 石振锋王亚卓崔宝艳桑略
Owner 黑龙江阳光惠远知识产权运营有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products