Internet big data cleaning method

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A data cleaning and big data technology, applied in the field of data cleaning, can solve the problems of low efficiency of screening and cleaning, achieve the effect of solving low efficiency of screening and cleaning, improving accuracy, and reducing the workload of data collection

Pending Publication Date: 2020-01-31

广州宏数科技有限公司

View PDF0 Cites 5 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0003] In view of this, the present invention provides a method for cleaning Internet big data to solve the defect of low efficiency of screening and cleaning caused by data loss in the prior art

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0032] The following examples are used to further describe the present invention in detail, but the examples do not limit the present invention in any form. Unless otherwise specified, the reagents, methods and equipment used in the present invention are conventional reagents, methods and equipment in the technical field, But it does not limit the present invention in any form.

[0033] Such as Figure 1-2 As shown, the present embodiment discloses a method for cleaning Internet big data, comprising the following steps:

[0034]S1. Utilize the data collection module 1 to log in to the target server through the http protocol, and use regular expressions, xpath expressions and jsonpath expressions to extract the required data; wherein, the http protocol is a simple request-response protocol, which usually runs on over TCP. It specifies what kind of messages the client may send to the server and what kind of responses it may get. Request and response message headers are given ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention relates to the technical field of data cleaning, and relates to an internet big data cleaning method which comprises the following specific steps: S1, extracting required data by utilizing a data acquisition module; S2, synchronizing the files in the oss by utilizing a crawler synchronization module; S3, packing the processed data by using a data cleaning module, and inserting the packed data into a kafaka queue of a KAFKA module; S4, reasonably allocating the data to a server queue by using a KAFKA module and an election algorithm, and transmitting the data to a database modulethrough a network; and S5, monitoring the data transmitted by the KAFKA module by utilizing the database module, and expanding monitoring statistics by utilizing filter-chainshain. According to the invention, the data cleaning module effectively classifies, integrates and cleans the data into each standardized database module again, so that the data cleaning accuracy is improved, the defect of lowscreening and cleaning efficiency caused by data loss of big data in the prior art is overcome, and the purpose of quickly and accurately screening and cleaning the data is achieved.

Description

technical field [0001] The invention relates to the technical field of data cleaning, and more specifically, to a method for cleaning Internet big data. Background technique [0002] In the era of information big data, the collection and processing of data has become an urgent problem for information companies to solve. At present, the original data we collect through the collection system is also called irregular data, that is, at present, the data is mixed with a large amount of useless, disordered, disordered, and repetitive data, and the format of the data cannot meet our requirements. The basic requirements of data processing are very unfavorable for later modification, and the accuracy of data is low. In view of the above situation, it needs to be preprocessed to convert it into the more regular data we need in the future work, so the data cleaning here actually refers to the basic preprocessing of the data to facilitate our subsequent statistical analysis. Make trad...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06F16/215G06F16/27G06F16/951G06F16/9536G06F21/62G06K9/62

CPCG06F16/215G06F16/951G06F16/9536G06F21/6218G06F16/275G06F18/241Y02D10/00

Inventor 刘磊张洪

Owner 广州宏数科技有限公司

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Internet big data cleaning method

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology