Data processing method and device based on Spark

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
A data processing and data type technology, applied in the computer field, can solve the problem of high technical requirements of the queryer, and achieve the effect of simple and easy to use

Active Publication Date: 2017-05-31

HAIER YOUJIA INTELLIGENT TECH BEIJING CO LTD

View PDF3 Cites 8 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0003] The present invention provides a Spark-based data processing method and device to solve the problem in the prior art that querying data on a distributed file system HDFS requires the development of a query program and requires relatively high technical requirements for the queryer

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

example 1

[0070] Example 1, query the alarm data of Haier's drum washing machine on October 14, 2016, and save it in a csv file: the program of the embodiment of the present invention-q "select * from alarm data"-t20161014-p' drum washing machine number' -o / data / query result.csv;

[0071] That is, the data type is alarm data, the time is 20161014, and the device type is the drum washing machine serial number.

example 2

[0072] Example 2, the small files of a large amount of Parquet formats under the / sample / directory on the HDFS are merged into 7 large files: the program-q of the embodiment of the present invention "select*from / sample / *.parquet"-o / data / Merge file .Parquet7.

[0073] Device embodiment

[0074] The embodiment of the present invention provides a Spark-based data processing device, see Figure 4 , the device includes: a receiving unit, used to receive the SQL statement input by the user; an acquisition unit, used to acquire the query information in the SQL statement input by the user; The content of the "-t" field is recognized as the data type, the content after the "-t" field is recognized as the time, and the content after the "-p" field is recognized as the device type, and the pre-set SQL statement is generated according to the data type, time and device type, and According to the SQL statement, data query is performed on the distributed file system HDFS based on the ope...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a data processing method and device based on Spark. The method comprises the steps of mapping a data source corresponding to search information on a distributed file system HDFS according to the received search information, then generating again a preset SQL statement according to the data source and a search condition, conducting data research on the distributed file system HDFS according to the SQL statement to obtain a search result, and finally outputting the search result. That is, the data processing method is simple and easy to use, a normal searcher can conduct data search and format conversion on a big data platform conveniently, and it is not required that the data searcher has high technical knowledge or edit codes and development programs.

Description

technical field [0001] The invention relates to the field of computer technology, in particular to a Spark-based data processing method and device. Background technique [0002] There are massive amounts of data on the distributed file system HDFS based on Hadoop, the basic framework of the distributed system. These data are stored in compressed file format. When querying these data, they need to be based on different dimensions, such as time range, device type, and data type. Wait. However, in the existing method, it is necessary to develop a query program every time the data is queried, or perform some mapping on the data, etc., which requires relatively high technical requirements for the inquirer and is inconvenient to use. Contents of the invention [0003] The present invention provides a Spark-based data processing method and device to solve the problem in the prior art that querying data on a distributed file system HDFS requires the development of a query program...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & AuthorityApplications(China)

IPC IPC(8): G06F17/30

CPCG06F16/182G06F16/2433

Inventor戚永峰

OwnerHAIER YOUJIA INTELLIGENT TECH BEIJING CO LTD

Data processing method and device based on Spark

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

example 1

example 2

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology