Automated supervised learning method with multi-source data supported

A learning method and a supervised technology, applied in the field of automatic supervised learning that supports multi-source data, can solve problems such as simplifying the machine learning process, and achieve the effect of saving preprocessing time, reducing complexity, and reducing intervention

Inactive Publication Date: 2017-12-26
ZHEJIANG UNIV
View PDF0 Cites 24 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] At present, the development of automated machine learning is just in its infancy, and a large number of research topics are in the exploratory stage. The entire machine learning industry has not yet used automated machine learning to simplify the machine learning process. On the other h...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Automated supervised learning method with multi-source data supported
  • Automated supervised learning method with multi-source data supported
  • Automated supervised learning method with multi-source data supported

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0033] In order to describe the present invention more specifically, the technical solutions of the present invention will be described in detail below in conjunction with the accompanying drawings and specific embodiments.

[0034] Such as figure 1 and figure 2 As shown, the present invention supports the automatic supervised learning method of multi-source data, comprises the following steps:

[0035] (1) Preprocessing of multi-source data structures.

[0036] Carry out format conversion, statistical analysis, missing value processing, deduplication processing, training set division and category determination on the source data in sequence, and the output is D in a unified format. train and D test .

[0037] 1.1 Convert source data in five different formats, including CSV, JSON, DAT, Parquet, and SAS. The strategy adopted by the system is: CSV, JSON, and Parquet formats are converted using the Spark SQL library, which specifically uses Dataframe-based (Spark SQL librar...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an automated supervised learning method with multi-source data supported, which comprises steps of (1) data pre-processing, (2) feature engineering, (3) model and super parameter regulation and (4) Bayesian pipeline optimization. The traditional data analysis process is automated, the process of manually regulating a machine learning pipeline is fundamentally improved, under the high coupling degree of super parameter regulation and pipeline optimization, the extensibility of the system on a supervised learning algorithm is greatly improved, a genetic algorithm is creatively put forward for regulation on the super parameters for the machine learning pipeline, and the efficiency for automated parameter regulation is greatly improved. Besides, a Bayesian optimizer is adopted for optimizing a pipeline algorithm combination, the problem of combination space explosion can be greatly solved, and finally, the accuracy and the efficiency of the automated supervised learning method are improved in the result.

Description

technical field [0001] The invention belongs to the technical field of machine learning, and in particular relates to an automatic supervised learning method supporting multi-source data. Background technique [0002] Machine learning is a multi-field interdisciplinary subject, involving probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and other disciplines. It specializes in how computers simulate or implement human learning behaviors to acquire new knowledge or skills, and reorganize existing knowledge structures to continuously improve their performance. Machine learning is divided into supervised learning and unsupervised learning. . At present, the development of machine learning has entered a new stage, and the research fields have also been expanded unprecedentedly, including expert systems, cognitive simulation, planning and problem solving, data mining, network information services, image recognition, fault diagnosi...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06N99/00G06N3/00G06K9/62G06K9/46
CPCG06N3/006G06N20/00G06V10/40G06F18/24155
Inventor 尹建伟范子琨邓水光李莹吴健吴朝晖
Owner ZHEJIANG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products