Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Construction method capable of automatically and dynamically updating forum reptile crawler system

A dynamic update and crawler system technology, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve problems such as loss of meaning, poor real-time performance, time-consuming and labor-intensive, etc., to avoid static updates and quickly facilitate system development , The effect of reducing the cost of system development

Inactive Publication Date: 2009-05-20
BEIJING UNIV OF POSTS & TELECOMM
View PDF0 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The shortcomings of this method are obvious for the forum: first, it is time-consuming and labor-intensive, and the webpage that I think is easy to update is crawled again, and this kind of judgment is difficult in the forum. It is concentrated in a large amount in a short period of time and then rapidly decays. This requires constant revision of the judgment mechanism that is easy to update. This requires a lot of manual participation, which is obviously unrealistic
Second, the real-time performance is poor. The forum updates a hot topic very quickly. If the static update cannot keep up with the speed, and people are more interested in searching for hot topics that are updated faster, due to the limitation of the number of web pages, wait until the static update When it arrives, this topic is likely to have no one's attention, so even if it is updated, it will lose its meaning
At present, there is no systematic and effective rapid real-time update crawler construction method

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Construction method capable of automatically and dynamically updating forum reptile crawler system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0032] In order to describe the specific implementation manner more clearly, the idea of ​​automatic dynamic update is firstly introduced.

[0033] Forums are different from other websites. There are generally two forms of updates, one is the start of a new topic, and the other is the continuation of an old topic. The update of useful information on the forum webpage is all carried out through these two update lines. Because forums are dynamically generated webpages, there is continuity between the addresses of forum webpages, as long as the last webpage of each topic is detected, and the newly downloaded last webpage is compared with the previous last webpage of the same topic, the It can be concluded whether an update is required. And because there is a citation relationship between Luntan webpages, when crawling other topic webpages, you can quickly update the webpages of other topics in real time at the same time, instead of updating the formulated ones after crawling and...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a construction method for an automatic dynamic update forum crawler system. The method comprises the following steps: firstly, extracting and storing pure web page hyperlinks; secondly, judging the position of topic web pages; thirdly, detecting whether the topic web pages are new or old; fourthly, processing new topic web pages; fifthly, processing old topic web pages; and sixthly, judging and processing the pure web page hyperlink conditions. The method can effectively overcome the defect of static update, automatically update web pages of a forum in real time, provide a general design framework for building the dynamic update forum crawler system, more quickly and conveniently realize system development, and effectively reduce the development cost of the system.

Description

technical field [0001] The invention relates to a construction method of a network data collection system, in particular to a construction method of an automatic dynamic update forum crawler system. Background technique [0002] With the development and popularization of computer technology and the rapid rise of the Internet, people gradually withdraw from the traditional form of communication and devote a lot of time and energy to the new form of communication -------- forum. Forum is the product of computer and Internet, it has many advantages, such as real-time, extensive and so on. It is these remarkable advantages that make people express their opinions on the forum, discuss hot issues, exchange technology and experience, and so on. The forum is different from the general portal website, its update speed is very fast, especially the update speed of some hot topic discussions is extremely fast, which poses a huge challenge to the crawler system of the forum. The update...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30
Inventor 杨溥郭军徐蔚然
Owner BEIJING UNIV OF POSTS & TELECOMM
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products