A stream data processing method based on sparksql and restapi

A processing method and streaming data technology, applied in special data processing applications, database management systems, database models, etc., can solve problems such as the inability to implement SQL algorithm calculations, the inability to expand algorithm capabilities, etc., to achieve convenient and complex algorithm calculation capabilities, convenient and complex Algorithmic computing support, the effect of realizing real-time computing capabilities

Active Publication Date: 2021-03-16
中科大数据研究院
View PDF1 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Traditional data is dominated by structured data, but now data from social networking sites, e-commerce, and the Internet of Things are basically unstructured and semi-structured data. Traditional data can be effectively managed and managed with a relational database management system. use, and now the data is large, fast, and complex, which greatly exceeds the capabilities of traditional database software tools. According to the IDC report, it is estimated that by 2020, the global data usage will reach 35.2ZB. In the face of such a massive amount of data, processing The efficiency of data means whether the enterprise has the corresponding capabilities and competitiveness, so it is necessary to have an easy-to-use, highly reliable, and high-performance data analysis system
[0003] Publication No. CN108268639A "An Index Calculation Method in a Big Data Environment" provides an index calculation method in a big data environment, which separates the index calculation scheduling code and business calculation SQL, and combines data reading, preprocessing, The SQL for calculation and result storage is all put into the database, and a new index is added. Only a few SQL lines are needed to achieve the goal, basically realizing the need for zero code increase, but this solution can only use the existing SQL capabilities, and cannot expand the external The algorithm capabilities provided by RestAPI, that is, complex SQL algorithm calculations cannot be realized

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A stream data processing method based on sparksql and restapi

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0036] The solution of the present invention aims at the calculation of massive flow data, and combines the algorithm capabilities of SQL and RestAPI, and realizes the calculation capability of complex algorithms based on SQL on the basis of distributed computing. The following will be combined with figure 1 The stream data processing method based on SparkSQL and RestAPI of the present invention is described in further detail, comprises the following steps:

[0037] S1: Initialize the algorithms in the RestAPI algorithm library to be packaged. The algorithm types mainly include text processing, data conversion, data sampling, feature extraction, parameter estimation, data verification, data evaluation, timing analysis, model evaluation, network computing, and text analysis. , recommendation algorithm, classification regression, deep learning, clustering algorithm, etc.

[0038] S1.1: Sorting out the content in the RestAPI algorithm library (the "sorting" here means manually s...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a streaming data processing method based on a SparkSQL (structured query language) and a RestAPI (Rest Application Programming Interface). The method comprises the following steps: S1, initializing an algorithm in a to-be-packaged RestAPI algorithm library; S2, packaging the initialized algorithm into a UDF format; S3, registering the packaged UDF; S4, assembling a stream data calculation program; and S5, obtaining an SQL statement input by the user and executing the SQL statement. According to the method, on the basis of traditional SQL mass streaming data calculation,convenient complex algorithm calculation support is provided, and SQL can be conveniently used for achieving the complex algorithm calculation capacity of the mass streaming data. According to the invention, the real-time computing capability of SQL streaming data can be realized.

Description

technical field [0001] The invention belongs to the technical field of data service governance, and in particular relates to a service provision method based on Internet big data. Background technique [0002] In recent years, with the advent of the era of big data, data has shown explosive growth, and the scale of information has become increasingly large. Information data provides a solid information foundation for corporate decision-making. The impact and change brought about by the generation of massive data on society is unprecedented. , for enterprises, how to quickly and efficiently extract useful value from data has become a new challenge. Traditional data is dominated by structured data, but now data from social networking sites, e-commerce, and the Internet of Things are basically unstructured and semi-structured data. Traditional data can be effectively managed and managed with a relational database management system. use, and now the data is large, fast, and com...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/25G06F16/28
CPCG06F16/25G06F16/284
Inventor 冯凯徐葳王元卓
Owner 中科大数据研究院
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products