Internet big data cleaning method

A data cleaning and big data technology, applied in the field of data cleaning, can solve the problems of low efficiency of screening and cleaning, achieve the effect of solving low efficiency of screening and cleaning, improving accuracy, and reducing the workload of data collection

Pending Publication Date: 2020-01-31
广州宏数科技有限公司
View PDF0 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] In view of this, the present invention provides a method for cleaning Internet big data to

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Internet big data cleaning method
  • Internet big data cleaning method

Examples

Experimental program
Comparison scheme
Effect test

Example Embodiment

[0032] The following examples are used to further illustrate the present invention, but the examples do not limit the present invention in any form. Unless otherwise specified, the reagents, methods and equipment used in the present invention are conventional reagents, methods and equipment in the technical field. But the present invention is not limited in any form.

[0033] like Figure 1-2 As shown, this embodiment discloses a method for cleaning Internet big data, which includes the following steps:

[0034]S1. Use the data acquisition module 1 to log in to the target server through the http protocol, and use regular expressions, xpath expressions and jsonpath expressions to extract the required data; among them, the http protocol is a simple request-response protocol, which usually runs in over TCP. It specifies what kind of messages the client might send to the server and what kind of response it gets. The headers of the request and response messages are given in ASCI...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to the technical field of data cleaning, and relates to an internet big data cleaning method which comprises the following specific steps: S1, extracting required data by utilizing a data acquisition module; S2, synchronizing the files in the oss by utilizing a crawler synchronization module; S3, packing the processed data by using a data cleaning module, and inserting the packed data into a kafaka queue of a KAFKA module; S4, reasonably allocating the data to a server queue by using a KAFKA module and an election algorithm, and transmitting the data to a database modulethrough a network; and S5, monitoring the data transmitted by the KAFKA module by utilizing the database module, and expanding monitoring statistics by utilizing filter-chainshain. According to the invention, the data cleaning module effectively classifies, integrates and cleans the data into each standardized database module again, so that the data cleaning accuracy is improved, the defect of lowscreening and cleaning efficiency caused by data loss of big data in the prior art is overcome, and the purpose of quickly and accurately screening and cleaning the data is achieved.

Description

technical field [0001] The invention relates to the technical field of data cleaning, and more specifically, to a method for cleaning Internet big data. Background technique [0002] In the era of information big data, the collection and processing of data has become an urgent problem for information companies to solve. At present, the original data we collect through the collection system is also called irregular data, that is, at present, the data is mixed with a large amount of useless, disordered, disordered, and repetitive data, and the format of the data cannot meet our requirements. The basic requirements of data processing are very unfavorable for later modification, and the accuracy of data is low. In view of the above situation, it needs to be preprocessed to convert it into the more regular data we need in the future work, so the data cleaning here actually refers to the basic preprocessing of the data to facilitate our subsequent statistical analysis. Make trad...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F16/215G06F16/27G06F16/951G06F16/9536G06F21/62G06K9/62
CPCG06F16/215G06F16/951G06F16/9536G06F21/6218G06F16/275G06F18/241Y02D10/00
Inventor 刘磊张洪
Owner 广州宏数科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products