Method for collecting page data in dynamic web page

A technology of page data and collection method, applied in the field of big data, can solve the problems of poor collection effect and data quality, waste of manpower and time, etc., and achieve the effect of improving accuracy and collection efficiency, strong practicability and wide application range

Inactive Publication Date: 2015-10-21
INSPUR QILU SOFTWARE IND
View PDF3 Cites 9 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] The traditional data collection method can only obtain the static data in the webpage, and is helpless for some dynamic and real-time changing data. Using the traditional collection method not only wastes a lot of manpower and time, but also the collection effect and data quality are also very poor. Difference

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for collecting page data in dynamic web page

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0028] The present invention will be further described below in conjunction with the accompanying drawings and specific embodiments.

[0029] In order to solve the problem in the prior art that the dynamic data in the webpage cannot be collected or the dynamic data collection rate in the webpage is low and the collection cost is high, the present invention provides a page data collection method for a dynamic webpage, and the present invention is mainly aimed at Internet The process of dynamic capture of more and more dynamic data on the Internet, such as news data, BBS data and network public opinion data. This scheme embeds the script parsing environment into the distributed web crawler, and realizes the data collection of dynamic pages. Use the perfect Nutch data mining and indexing functions to correct the operation steps to achieve our goal of efficiently grabbing dynamic data.

[0030] as attached figure 1 As shown, the specific implementation process is: use the script...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method for collecting page data in a dynamic web page, which is specifically realized by that an environment is analyzed by a script and embedded into a distributed network spider, and the data in the dynamic web page is collected through data exploration, indexing and searching of the network spider. In comparison with the prior art, the method for collecting the page data in the dynamic web page collects different kinds of dynamic data in complete forms, and stores the data in a database, so that we can know Internet trends in real time, inaccurate and untimely data collection is avoided, the defect in traditional collection methods that data in the web page is collected for only one time and the collection is not realized according to demands is solved, and accuracy and efficiency of the collection are greatly increased. The method for collecting the page data in the dynamic web page has strong practicability, and is wide in application and easy in popularization.

Description

technical field [0001] The invention relates to the technical field of big data, in particular to a method for collecting page data of a dynamic webpage with strong practicability. Background technique [0002] At present, with the rapid development of network technology, the proportion of dynamic pages embedded with JavaScript scripts on the Internet is increasing, which brings great difficulties to the work of page data collection. In terms of Internet public opinion and search engine research, although the main object of page data collection is still static pages, the demand for data collection in dynamic pages is becoming more and more urgent. [0003] The traditional data collection method can only obtain the static data in the webpage, and is helpless for some dynamic and real-time changing data. Using the traditional collection method not only wastes a lot of manpower and time, but also the collection effect and data quality are also very poor. Difference. [0004] ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/951
Inventor 焦毓葳崔乐乐王贵友
Owner INSPUR QILU SOFTWARE IND
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products