A method for automatically extracting abstracts of network articles

An automatic extraction and abstraction technology, applied in the field of data processing, can solve problems such as low extraction accuracy, complex operation, and difficult implementation, and achieve the effect of simple implementation and simple and effective thinking

Active Publication Date: 2019-01-22
优赛恒创科技发展(北京)有限公司 +1
View PDF4 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0012] In order to solve the above-mentioned problems such as low abstract extraction accuracy, difficult implementation, and complicated operation, the present invention provides an automatic abstract extraction method for network articles, which can achieve simplicity and efficiency while ensuring extraction accuracy, so that it can be used in limited Efficient and high-quality processing of massive web articles in a timely manner

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A method for automatically extracting abstracts of network articles
  • A method for automatically extracting abstracts of network articles

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0077] The specific implementation manners of the present invention will be further described below in conjunction with the drawings and examples. The following examples are only used to illustrate the technical solution of the present invention more clearly, but not to limit the protection scope of the present invention.

[0078] Such as figure 1 As shown, the present invention describes a method for automatic abstract extraction of network articles, which mainly includes the following steps:

[0079] S1. Obtain articles, and clean impurities and advertisements;

[0080] S2. Set the digest length. Through the statistics of shorthand extraction of articles with abstracts, it is found that the number of characters in most of the abstracts is in the interval R=[100,200]. In fact, if the number of words is too small, the expression of important information may not be sufficient; if the number of words is too large, the effect of a summary of the content cannot be achi...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method for automatically extracting abstracts of network articles, which comprises the following steps: obtaining articles; setting summary length; extracting keywords and obtaining the weight and part of speech of the keywords; carrying out Chinese part-of-speech tagging on the obtained keywords; getting the keyword list Tags; obtaining a title keyword list and a body keyword list; getting the same list of keywords; performing weighted average; performing segmentation; getting the number of hits and the cumulative weight of the word; and getting the final summary andso on. The above step are taken, the invention can automatically obtain the abstract of the article captured by the network and store the abstract in a database, which provides a basic guarantee forthe following retrieval and display, and has the beneficial effects of simple and effective thinking, simple implementation and so on. Finally, the invnetion ensures the extraction accuracy while achieving a concise and efficient effect, so that a large number of network articles can be processed in a limited time with high efficiency and high quality.

Description

technical field [0001] The invention relates to the field of data processing, in particular to an automatic abstract extraction method capable of processing massive network articles with high efficiency and high quality. Background technique [0002] Abstracts are also called summaries, summaries, etc. The abstract is a short essay that expresses the important content of the article concisely and accurately, without comments or supplementary explanations, for the purpose of providing an outline of the article content. [0003] With the development and popularization of network technology, the number of new articles generated on the Internet every day has reached a level of one million, which brings new challenges to the retrieval of articles. [0004] At the same time, because the quality of articles on the Internet varies, and most articles do not have abstract content, how to automatically generate new article abstracts has become a new topic. [0005] At present, the ex...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/34G06F17/27
CPCG06F40/211G06F40/258G06F40/289
Inventor 鄢军袁传义徐光杰林建波
Owner 优赛恒创科技发展(北京)有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products