Data set general situation evaluation method based on multi-source heterogeneous features

A multi-source heterogeneous data, multi-source heterogeneous technology, applied in the direction of structured data retrieval, structured data browsing, special data processing applications, etc.

Pending Publication Date: 2020-11-03
深圳慕智科技有限公司
View PDF0 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the data feature processing part takes up a lot of time in the entire R&D work. Further technical processing work is required to automatically perform data feature processing, extraction and statistics on multi-source heterogeneous data sets.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data set general situation evaluation method based on multi-source heterogeneous features
  • Data set general situation evaluation method based on multi-source heterogeneous features
  • Data set general situation evaluation method based on multi-source heterogeneous features

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0020] Embodiments of the present invention are described below through specific examples, and those skilled in the art can easily understand other advantages and effects of the present invention from the content disclosed in this specification.

[0021] This patent uses python to implement the automatic generation of data set overview evaluation reports, mainly using feature processing technology, and the specific key technologies involved include word2vec technology, TF-IDF technology, data topology analysis technology, etc.

[0022] 1. Text feature extraction

[0023] In the present invention, the word in each report in the training data set is mapped to a vector to judge its frequency of occurrence, thereby generating a keyword library. And use TF-IDF technology to judge the importance of elements, so as to generate the final defect report that accurately describes the defect in the report generation module. TF-IDF technology is a commonly used weighting technology for in...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

A data set general situation evaluation method based on multi-source heterogeneous features comprises a multi-source heterogeneous data set automatic feature processing module and a data set general situation evaluation report generation module. In the multi-source heterogeneous data set automatic feature processing module, a data feature processing technology is adopted for various types of data,features are automatically extracted, and finally unified structural data is generated. In the data set general situation evaluation report generation module, the data general situation is displayedafter the preorder data processing through a plurality of indexes, and a data topological graph of the multi-dimensional features is displayed in a dimension reduction form. High-dimensional data information is mapped to a low-dimensional space by adopting a topological analysis method, so that a user can intuitively define high-dimensional data.

Description

[0001] A data set profile evaluation method based on multi-source heterogeneous features Technical field [0002] The invention belongs to the field of general software, and in particular relates to the generation of assessment reports. After data processing and data feature extraction are completed for the dataset, a corresponding dataset feature evaluation report is generated. Background technique [0003] The three major elements of the current development of artificial intelligence: data, computing power, and algorithms. Data sets, computing power, and algorithms complement and enhance each other, and the three are indispensable. In academia, the meaning of data sets is more direct: data is the foundation, without data sets, corresponding research work cannot be carried out, and any research is inseparable from data. In this age of machine learning, data is more important than algorithms, and the quality of results obtained by many algorithms depends entirely on how well ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F16/2458G06F16/26G06K9/62
CPCG06F16/2465G06F16/26G06F18/213
Inventor 王晓冰张朱佩田王黛薇刘佳玮
Owner 深圳慕智科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products