WEB document-based automatic abstracting method

An automatic summary and document technology, applied in the field of text processing, can solve the problems of rough summary, low efficiency of user acquisition of information, and inability to provide users with information, so as to improve the shopping environment, improve efficiency and quality, and improve shopping efficiency.

Inactive Publication Date: 2015-02-18
HOHAI UNIV
View PDF4 Cites 24 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The existing document automatic summarization methods mainly focus on the method based on sentence extraction, and the formed summary is still r

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • WEB document-based automatic abstracting method
  • WEB document-based automatic abstracting method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0023] In order to make the technical means, creative features, goals and effects achieved by the present invention easy to understand, the present invention will be further described below in conjunction with specific embodiments.

[0024] The invention expands the keyword database, creates a user-oriented non-keyword database, performs word segmentation and feature word extraction on text information, and generates an abstract that can accurately reflect the meaning of the full text.

[0025] The invention relates to the technical field of text processing, in particular to an automatic summarization method based on WEB documents. The method is specifically as follows: taking a URL as input, comprehensively utilizing the Html document dom (Document Object Model, Document Object Model) tag tree to capture WEB document information, and then dividing the acquired information into blocks; Segment the captured information to remove meaningless non-keywords; determine the wei...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a WEB document-based automatic abstracting method. The method comprises the following steps of (1) capturing WEB document text information by using an Html document object model tag tree; (2) partitioning and phrasing the captured WEB document text information; (3) performing word segmentation on the captured WEB document text information according to a keyword bank, removing meaningless non keywords, adding network neologisms and specialized words into the keyword bank, and adding non keywords which have been out of service in the network into a non-keyword bank; (4) calculating word segmentation weight and phrasing and partitioning weight; (5) according to the elaboration degree of an abstract, selecting the number of partitions and phrases, and finally selecting several partitions and phrases with the highest weight to form document summary information. According to the method, the WEB document information can be analyzed, the concise and comprehensive summary for page content is provided for a user, and the efficiency of acquiring the information by the user is improved.

Description

technical field [0001] The invention relates to the technical field of text processing, in particular to an automatic summarization method based on WEB documents. Background technique [0002] In the field of text processing technology, how to quickly and accurately obtain information from massive WEB information has become a current research hotspot. In order to effectively improve the efficiency of information acquisition, the research on document automatic summarization technology has emerged as the times require and has received extensive attention. . It can summarize the complex and lengthy document content in concise and clear language, and it is of great help to quickly identify and obtain information. The existing document automatic summarization methods mainly focus on the method based on sentence extraction, and the formed summary is still relatively rough, which cannot provide users with a concise summary of document content, and the efficiency of users to obtain...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30G06F17/27
CPCG06F16/313G06F16/986
Inventor 刘文婷
Owner HOHAI UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products