Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Web forum information extraction system

An information extraction and forum technology, applied in the field of Web information processing, can solve problems such as low accuracy, inability to meet practical applications, and a large number of manual participation, and achieve the effects of high accuracy, cost reduction, and strong versatility

Inactive Publication Date: 2010-06-09
THE PLA INFORMATION ENG UNIV
View PDF0 Cites 31 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Because this method does not require manual labeling of samples, it is suitable for information extraction of a large number of sites and web pages, but its accuracy is relatively low
[0007] Due to the huge number of forums on the Internet and the different styles of each forum, there are more or less problems in the existing methods for Web forum information extraction: methods 1 and 2 require a lot of manual participation and cannot meet the needs of practical applications; method 3 It can realize automatic extraction but the accuracy rate is relatively low

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Web forum information extraction system
  • Web forum information extraction system
  • Web forum information extraction system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0021] The present invention will be further described below in conjunction with the accompanying drawings and embodiments.

[0022] Such as figure 1 Shown, system structure of the present invention comprises following module:

[0023] Web forum webpage collection module 101, for automatically downloading forum webpage according to the forum site and corresponding section specified by the user, this collection module needs to utilize the content extracted in the extraction module; webpage analysis module 102, for cleaning the webpage content , making it meet the HTML specification and parsing the webpage to form the Document Object Model (DOM) of the webpage; the online extraction module 103 is used to extract the specified information in the webpage according to the structural characteristics of the forum webpage and the characteristics and statistical laws of the information to be extracted ; The database storage module 104 is used to store the extracted content in the data...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a web forum information extraction system, which comprises: a webpage acquisition module which is used for automatically downloading forum webpages according the forum websites and corresponding boards, which are both designated by a user; a webpage analysis module which is used for cleaning the content of the webpages to built a document object model of the webpages for implementing an information extraction algorithm; an online extraction module which is used for extracting the designated information on the webpages according to the characteristics of the layout structure of the forum webpages; and a database storage module which is used for storing the extracted information in a database system for other application. The system can automatically extract the designated information on various forums on the Internet and has high accuracy.

Description

technical field [0001] The invention relates to the technical field of Web information processing, in particular to a Web forum information extraction system. Background technique [0002] With the continuous development of Internet technology, the information on the Internet shows explosive growth. Among them, Web forums have developed particularly rapidly. According to the statistics of China Web Information Center at the end of 2008, the number of users of Web forums has reached 91 million, accounting for more than 30% of the total number of Internet users. Thousands of people publish information, discuss issues, and exchange opinions in different Web forums every day. Over time, Web forums become a huge information resource pool. How to effectively extract useful information from Web forums has important significance. [0003] Web forum information extraction belongs to the extraction of certain attributes in web pages in Web information extraction, such as extracting ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30
Inventor 李弼程王允林琛郭志刚阎红灿
Owner THE PLA INFORMATION ENG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products