Internet data acquisition method with high matching degree

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
A data collection, Internet technology, applied in network data indexing, network data retrieval, other database retrieval and other directions, can solve the problems of poor matching of captured data, data duplication, etc., to avoid repeated capture, meet user needs, Wide range of effects

Inactive Publication Date: 2019-04-19

河南大瑞物联网科技有限公司

View PDF0 Cites 1 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0002] Internet webpage data collection is a process of obtaining Internet webpage content, which is generally crawled by web crawlers, but in the existing crawling process, repeated crawling of the same URL, duplication of captured data, and matching between captured data often occur. Based on this, we now provide an Internet data collection method with a high matching degree, which extracts the data content required by the user from the web page through analysis, and converts and processes the extracted data content through content and format Processing, storage to meet user needs

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0011] A method for collecting Internet data with a high degree of matching, the implementation process is as follows: first crawl the url list, provide the website url addresses that need to extract data for web crawlers, and store the website urls that need to extract data into the crawl url list; The crawler obtains the url information of the website that needs to extract data from the crawled url list; the web crawler obtains the corresponding page content from the corresponding url page and extracts the keyword information required by the user; the web crawler writes the extracted data into the database Middle; design the data analysis and comparison module, and process the data in the database through the data analysis and comparison module.

[0012] The web crawler performs data collection work according to the rules configured in advance by the user, and the configured rules include web page download rules, web page parsing rules, and content extraction rules.

[0013]...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses an Internet data acquisition method with high matching degree, and the method comprises the steps: firstly crawling a url list, providing a website url address which needs to extract data for a web crawler, and storing the website url address which needs to extract the data into the crawling url list; The web crawler obtains url information of the website needing data extraction from the crawling url list; The web crawler obtains the corresponding page content from the corresponding url page and extracts keyword information required by the user; The web crawler writes the extracted data into a database; And designing a data processing module, and processing the data in the database through the data analysis and comparison module. Compared with the prior art, the internet data acquisition method with the high matching degree processes the data through a data processing mode of link filtering, data rearrangement and integration, eliminates repeated data, avoids repeated capture, and is high in integration and matching degree of the data, so that the internet data acquisition method better meets the requirements of users, and is high in practicability, wide inapplication range and easy to popularize.

Description

technical field [0001] The invention relates to the field of computer application technology, in particular to a highly practical method for collecting Internet data. Background technique [0002] Internet webpage data collection is a process of obtaining Internet webpage content, which is generally crawled by web crawlers, but in the existing crawling process, repeated crawling of the same URL, duplication of captured data, and matching between captured data often occur. Based on this, we now provide an Internet data collection method with a high matching degree, which extracts the data content required by the user from the web page through analysis, and converts and processes the extracted data content through content and format Processing, storage to meet the needs of users. Contents of the invention [0003] The technical task of the present invention is to provide an Internet data collection method with strong practicability and high matching degree aiming at the abo...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & AuthorityApplications(China)

IPC IPC(8): G06F16/951G06F16/955G06F16/903

Inventor韩金花

Owner河南大瑞物联网科技有限公司

Internet data acquisition method with high matching degree

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology