Implementation method of adaptive dynamic web crawler system based on machine learning

A technology for dynamic web pages and an implementation method, applied in the computer field, can solve the problems of increasing the writing workload of dynamic webpage crawler programs, increasing the workload, and lacking reusability, and achieving the effect of reducing the writing workload.

Active Publication Date: 2020-02-18
PEOPLE'S INSURANCE COMPANY OF CHINA
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The workload of these analytical studies increases exponentially with the complexity of the interaction process and interaction data
At the same time, the interaction rules of each website page are different, resulting in the lack of reusability of the analysis and research work on a certain website, which greatly increases the workload of writing dynamic web crawler programs

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Implementation method of adaptive dynamic web crawler system based on machine learning
  • Implementation method of adaptive dynamic web crawler system based on machine learning
  • Implementation method of adaptive dynamic web crawler system based on machine learning

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0016] The present invention will be described in further detail below in conjunction with the accompanying drawings and specific embodiments, but not as a limitation of the present invention.

[0017] figure 1 It is a schematic structural diagram of an adaptive dynamic web crawler system based on machine learning according to an embodiment of the present invention. Such as figure 1 As shown, the adaptive dynamic webpage crawler system based on machine learning includes a dynamic webpage path selection module 11 , a dynamic webpage path adaptive training module 12 and a dynamic webpage data capture module 13 .

[0018] The dynamic web page path selection module 11 is configured to obtain a set of interaction paths of all China Unicoms according to the input information.

[0019] Wherein, the input information of the dynamic webpage path selection module 11 includes one or more of the entry webpage address, the target webpage address, the initial input data used in the intera...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an implementation method of self-adaption dynamic webpage crawler system based on machine learning, which comprises: a dynamic webpage path selection module acquires all communicated interaction path set according to the input information; the dynamic webpage path self-adaption training module ranks the interaction path in the interaction path set output by the dynamic webpage path selection module in real time to form the interaction path list; a dynamic webpage data fetching module implements the dynamic webpage fetching to n optimal paths of the interaction path list and reflects the result to the dynamic webpage path self-adaption training module to update the interaction path list, wherein the input information comprises one or more of the following information: the address of an access webpage, the address of a target webpage, the initial input data used in interactive process, target information data structure and the fetching website range list .

Description

technical field [0001] The invention relates to a computer technology, in particular to a machine learning-based self-adaptive dynamic web crawler system. Background technique [0002] Due to the characteristics of strong interactivity and complex interactive data of dynamic web pages, the writing of current dynamic web crawler programs requires software developers to conduct specific analysis and research on page codes and interaction rules. The workload of these analytical studies increases exponentially with the complexity of the interaction process and interaction data. At the same time, the interaction rules of each website page are different, resulting in the lack of reusability of the analysis and research work on a certain website, which greatly increases the workload of writing dynamic web crawler programs. Contents of the invention [0003] Embodiments of the present invention provide an implementation method of an adaptive dynamic web crawler system based on ma...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F8/20G06N20/00
CPCG06F8/22
Inventor 刘序文王鹏王和邵利铎刘苍牧孙杰平刘晗李宏宇
Owner PEOPLE'S INSURANCE COMPANY OF CHINA
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products