Network data acquisition method and system

A technology of network data and data collection, which is applied in the field of network communication, can solve the problems of inconvenient data collection, etc., and achieve the effects of timely response to system risks, efficient data cleaning, and efficient information collection

Pending Publication Date: 2020-11-17
福建省天奕网络科技有限公司
View PDF7 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the current websites on the Internet have different formats, and it is necessary to find commonality among various website contents, and many websites will set up various obstacles to facilitate data collection.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Network data acquisition method and system
  • Network data acquisition method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0027] The present invention will be further described below in conjunction with the accompanying drawings.

[0028] see figure 1 Shown, a kind of method of network data acquisition of the present invention, described method comprises the steps:

[0029] Step S1, define a configuration file, set the parameters for obtaining website data in the configuration file, this configuration file can be used for the same field data collection of different websites; solve the same field data collection of different websites, and can be used after few modifications in other project programs. The parameters for acquiring website data include: current API address, current API address type, website name, website ID, website character set, number of sub-items collected per page, and maximum number of pages set by the current URL.

[0030] Step S2, read the configuration file and collect network data, that is, according to different websites, custom configure the browser UA logo (browser UA ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a network data acquisition method, which comprises the following steps of: S1, defining a configuration file, setting parameters for acquiring website data in the configurationfile, S2, reading the configuration file and acquiring network data, namely customizing and configuring a browser UA identifier according to different websites, wherein a web crawler mode, a timed multi-thread data collection mode, a multi-level data collection mode and a browser cookie storage collection mode are carried out through browser UA identification, and network data of a website are collected; S3, converting special characters of the webpage, namely performing data formatting processing on network data acquired from a network, namely performing processing in multiple modes of character string replacement, regular expression replacement or matching, space removal, prefix or suffix addition, date and time formatting and HTML transcoding; S4, storing the collection result, and exporting the data to the local to form a file or storing the data in a database. The invention provides acquisition efficiency.

Description

technical field [0001] The invention relates to the technical field of network communication, in particular to a method and system for collecting network data. Background technique [0002] Network data acquisition refers to the process of using Internet search engine technology to achieve targeted, industry-specific, and accurate data capture, and classify data according to certain rules and screening standards, and form a database file. Network data collection is mainly through the collection of massive Internet data, with the help of scientific modeling, listening to the voices of consumers, gaining insight into market opportunities, understanding the dynamics of competing products, and making various business decisions for the company's media investment, channel management, brand building, and product innovation. Provide guidance. However, the current websites on the Internet have different formats, and it is necessary to find commonality among various website contents,...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): H04L29/08G06F16/951
CPCH04L67/30G06F16/951
Inventor 刘德建柳旭辉张延锋郑成龙陈宏展
Owner 福建省天奕网络科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products