Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Data processing method based on web crawlers and structural storage

A structured storage and data processing technology, applied in the direction of network data retrieval, network data indexing, electronic digital data processing, etc., can solve the problems that the results cannot meet the needs, are not comprehensive enough, and cannot meet the requirements of the application, so as to reduce data The effect of source comparison, improving efficiency, and ensuring accuracy and completeness

Active Publication Date: 2016-10-26
UP WEALTH MANAGEMENT CO LTD
View PDF15 Cites 9 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0002] With the rapid development of the Internet industry, we are in an era of information explosion. We are surrounded by all kinds of useful or useless information every day. From the perspective of data application, the use of these information is not comprehensive enough, because There is always some data in the market that is not standardized. If you simply grab and reference this kind of data, the final result may not meet the requirements, or even if a lot of data is processed, it will not meet the requirements of the application.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data processing method based on web crawlers and structural storage

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0024] The present invention is described in detail below in conjunction with accompanying drawing:

[0025] Such as figure 1 As shown, a data processing method based on web crawler and structured storage includes the following steps:

[0026] Step 1: Determine the data source and configure the web crawler system;

[0027] Step 2: Configure the data processing interface according to the characteristics of the data source and the preset metadata structure;

[0028] Step 3: Filter and sort the data and files obtained by web crawlers, and filter and sort the information on the pages on the website according to the URL address. Non-duplicate data enters the database and is copied by the system platform. During the copying process, compare within 48 hours For similar news, compare the title, the text before the paragraph, and the text at the end of the paragraph, or compare the word segmentation of the text with greater than or equal to 80% believe the information to record and m...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a data processing method based on web crawlers and structural storage, pertaining to the technical field of computer application. The method comprises following steps: step 1, determining data sources and configuring a web crawler system; step 2, configuring a data processing interface according to features of data sources and pre-set meta-data structure; step 3, screening and duplicating data and files obtained by web crawlers; step 4, calling data and files to different data maintenance interfaces according to indexes. The data processing method based on web crawlers and structural storage has following beneficial effects: there is no need to track all data sources by deploying a large number of people; data source comparisons are reduced and duplication workload is lowered so that data acquisition efficiency is effectively increased; during data storage, a structural processing method is adopted for standardizing data; accurate logic verifications of data before entering a database are achieved so that accuracy and integrity of data are ensured. The invention further discloses a web crawler module.

Description

technical field [0001] The invention relates to a data processing method based on a web crawler and structured storage, belonging to the technical field of computer applications. Background technique [0002] With the rapid development of the Internet industry, we are in an era of information explosion. We are surrounded by all kinds of useful or useless information every day. From the perspective of data application, the use of these information is not comprehensive enough, because There is always some data in the market that is not standardized. If you simply grab and reference this kind of data, the final result may not meet the requirements, or even if a lot of data is processed, it will not meet the requirements of the application. Contents of the invention [0003] In order to overcome the above disadvantages, the present invention provides a data processing method based on web crawler and structured storage. [0004] The technical scheme that the present invention ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30
CPCG06F16/951
Inventor 郑文毅谢晓勇黄俊
Owner UP WEALTH MANAGEMENT CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products