System and method for extracting picture abstract based on page partitioning

A technology of page segmentation and extraction method, which is applied in the field of image summary extraction system based on page segmentation, can solve problems such as inability to meet user retrieval needs and lack of intuitive page information, achieve good display effect, high extraction accuracy, and improve performance effect

Inactive Publication Date: 2011-01-12
SOUTH CHINA UNIV OF TECH +1
View PDF6 Cites 33 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

This will inevitably fail to meet the growing search needs of users.
Text summarization is to extract the most relevant text field from the page text as a summary of the page. Although this summary can also fill in the blind spots of the user's cognition of the page to a certain extent, it lacks the content of the page. intuitive information

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • System and method for extracting picture abstract based on page partitioning
  • System and method for extracting picture abstract based on page partitioning
  • System and method for extracting picture abstract based on page partitioning

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0072] figure 1

[0073]

[0074]

[0075]

[0076]

[0077]

[0078] figure 2

[0079]

[0080]

[0081]

[0082] image 3

[0083]

[0084] label. Preferably, the step S6 specifically includes the following steps: S6.1. The topic block recognition module constructs a space vector model, and represents a text as a vector in the vector space after TF-IDF weight processing, wherein The vector formed by TF-IDF weight processing of the text is the subject vector, and the vector formed after the TF-IDF weight processing of the text in the entire web page is the document vector; S6.2, the subject block recognition module calculates the subject vector and the document The similarity between vectors, and sort all document vectors by similarity, take the document vector with the highest similarity to the topic vector as the topic block, and the topic block identification module sends the topic block to the information extraction module. Preferably, the step S7 specifically includes the following ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a system for extracting a picture abstract based on page partitioning, which comprises a page preprocessing module, a page sorting module, a page partitioning module, a subject block identifying module and an information extracting module. The invention also discloses a method for extracting the picture abstract based on the page partitioning, which specifically comprises the following steps that: 1, a page is crawled on the Internet; 2, the page preprocessing module preprocesses the page; 3, the page sorting module sorts the preprocessed page; 4, the page partitioning module partitions a semantic chunk; 5, the page partitioning module sends the subject type page to the subject block identifying module, and the subject block identifying module identifies the subject block and then sends the subject block to the information extracting module; and the page partitioning module sends the non-subject type page to the information extracting module; and 6, the information extracting module downloads the picture and is correlated with the page. The system and the method have the advantages of high extracting speed, high accuracy and good effect.

Description

Technical field [0001] The invention relates to the technical field of picture abstract extraction, in particular to a picture abstract extraction system and method based on page segmentation. Background technique [0002] With the rapid development of informatization today, there are multiple sources of information both inside and outside organizations such as enterprises, governments, and schools. The amount of information on the Internet is huge, and the number of knowledge documents existing within the organization is also exploding. [0003] Current retrieval systems mostly only provide page titles, text abstracts, and page snapshot functions. This will inevitably fail to meet the increasing search requirements of users. The text abstract is to extract the most relevant text fields of the page and search terms from the page text as a kind of abstract of the page. Although the abstract can also fill the user's cognitive blind spots on the page to a certain extent, it lacks th...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 董守斌张朝斌张凌李粤袁华
Owner SOUTH CHINA UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products