Data processing method and device based on Spark

A data processing and data type technology, applied in the computer field, can solve the problem of high technical requirements of the queryer, and achieve the effect of simple and easy to use

Active Publication Date: 2017-05-31
HAIER YOUJIA INTELLIGENT TECH BEIJING CO LTD
View PDF3 Cites 8 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] The present invention provides a Spark-based data processing method and device to solve the problem in the prior art that querying data on a distributed file system HDFS requires the development of a query program and requires relatively high technical requirements for the queryer

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data processing method and device based on Spark
  • Data processing method and device based on Spark
  • Data processing method and device based on Spark

Examples

Experimental program
Comparison scheme
Effect test

example 1

[0070] Example 1, query the alarm data of Haier's drum washing machine on October 14, 2016, and save it in a csv file: the program of the embodiment of the present invention-q "select * from alarm data"-t20161014-p' drum washing machine number' -o / data / query result.csv;

[0071] That is, the data type is alarm data, the time is 20161014, and the device type is the drum washing machine serial number.

example 2

[0072] Example 2, the small files of a large amount of Parquet formats under the / sample / directory on the HDFS are merged into 7 large files: the program-q of the embodiment of the present invention "select*from / sample / *.parquet"-o / data / Merge file .Parquet7.

[0073] Device embodiment

[0074] The embodiment of the present invention provides a Spark-based data processing device, see Figure 4 , the device includes: a receiving unit, used to receive the SQL statement input by the user; an acquisition unit, used to acquire the query information in the SQL statement input by the user; The content of the "-t" field is recognized as the data type, the content after the "-t" field is recognized as the time, and the content after the "-p" field is recognized as the device type, and the pre-set SQL statement is generated according to the data type, time and device type, and According to the SQL statement, data query is performed on the distributed file system HDFS based on the ope...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a data processing method and device based on Spark. The method comprises the steps of mapping a data source corresponding to search information on a distributed file system HDFS according to the received search information, then generating again a preset SQL statement according to the data source and a search condition, conducting data research on the distributed file system HDFS according to the SQL statement to obtain a search result, and finally outputting the search result. That is, the data processing method is simple and easy to use, a normal searcher can conduct data search and format conversion on a big data platform conveniently, and it is not required that the data searcher has high technical knowledge or edit codes and development programs.

Description

technical field [0001] The invention relates to the field of computer technology, in particular to a Spark-based data processing method and device. Background technique [0002] There are massive amounts of data on the distributed file system HDFS based on Hadoop, the basic framework of the distributed system. These data are stored in compressed file format. When querying these data, they need to be based on different dimensions, such as time range, device type, and data type. Wait. However, in the existing method, it is necessary to develop a query program every time the data is queried, or perform some mapping on the data, etc., which requires relatively high technical requirements for the inquirer and is inconvenient to use. Contents of the invention [0003] The present invention provides a Spark-based data processing method and device to solve the problem in the prior art that querying data on a distributed file system HDFS requires the development of a query program...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/182G06F16/2433
Inventor 戚永峰
Owner HAIER YOUJIA INTELLIGENT TECH BEIJING CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products