Big data mining tool and method based on dragging process

A data mining and big data technology, which is applied in digital data processing, structured data retrieval, database management systems, etc., can solve the problems of high cost of big data mining applications, high requirements for professional knowledge, integration, etc., and reduce the cost of data mining. Threshold, optimize computing efficiency, and reduce the effect of using the threshold

Pending Publication Date: 2020-03-24
BEIJING HUARU TECH
View PDF8 Cites 14 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Data mining has been fully utilized and developed on traditional small data sets, but on large data sets, due to its particularity in storage and calculation, the previous mature data mining algorithms cannot directly integrate with the computing framework of big data. integration, making big data mining applications have a higher cost
[0003] In the big data scenari

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Big data mining tool and method based on dragging process
  • Big data mining tool and method based on dragging process
  • Big data mining tool and method based on dragging process

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0050] see figure 2 , this implementation takes big data cleaning as an example to describe the specific application of the present invention in detail.

[0051] (1) Add start and finish operators. Data mining tools use start and finish operators as the start and end marks of a mining process, and guide the calculation of the entire process. If start and finish operators are not set, effective calculations cannot be performed.

[0052] (2) Add a data source. The data mining tool uses data source operators to represent the data to be mined. You can directly drag and drop the data source operators to the blank space in the process view to realize data import.

[0053] (3) Add a big data mining operator, drag the "Data Selection" operator plug-in to the blank space in the process view, and realize the data selection function loading.

[0054] (4) Connecting operators, connecting data sources and "data selection" operators to realize data transfer, the connection sequence from t...

Embodiment 2

[0059] see image 3 , this embodiment takes neural network regression prediction as an example to describe the big data submission method and the calculation engine selection method in detail.

[0060] (1) image 3 The function description of the process operators shown is that the "data splitting" operator implements training and test set division, the "neural network" operator implements model training, and the "attribute selection" operator implements test set delabelling; the "model application" operator implements The child will make regression predictions on the test set based on the training model;

[0061] (2) Submit all, at this time the calculation engine will calculate the output of all operators from "start" to "finish";

[0062] (3) Partial submission. At this time, the calculation engine will calculate the output of the specified operator, such as the "data selection" operator. By parsing the process XML file, the "data selection" operator depends on the path. ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a big data mining tool and method based on a dragging process. The big data mining tool comprises a data management module, an operator library module, a process management module, a data mining engine module, a display design module and a visualization module. The method has the following beneficial effects: (1) logic separation of big data calculation and modeling is realized, advanced encapsulation of a big data mining algorithm library is realized in the aspects of multi-level reduction of a data mining threshold and big data calculation, and an operator does not need to know specific algorithm implementation; in the aspect of big data modeling, a graphical interactive prototype is developed, and a mining process is realized through intuitive dragging connection; (2) the optimal calculation mode can be automatically selected according to the operator content of the mining process, and the calculation efficiency is optimized; (3) in a big data scene, the calculation cost is greatly reduced; and (4) tedious cluster construction and maintenance are not needed, and the big data mining use threshold is reduced.

Description

technical field [0001] The invention belongs to a big data mining tool and method, in particular to a drag-and-drop process-based big data mining tool and method. Background technique [0002] Data mining is a key technology to obtain potential knowledge from massive information, and has become an important force to promote the development of various fields. Data mining has been fully utilized and developed on traditional small data sets, but on large data sets, due to its particularity in storage and calculation, the previous mature data mining algorithms cannot directly integrate with the computing framework of big data. Combined, the application of big data mining has a higher cost. [0003] In the big data scenario, data mining involves algorithm principles, algorithm parallel code implementation, and algorithm combination, which requires high professional knowledge of operators, and mining modeling is a process of repeated iterative optimization, and the operation effi...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F16/2458G06F16/2453G06F16/242G06F16/248G06F16/25G06F3/0486
CPCG06F3/0486G06F16/242G06F16/24532G06F16/2462G06F16/2465G06F16/248G06F16/254
Inventor 王智永王文晋张可新
Owner BEIJING HUARU TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products