Enterprise data analysis method based on clustering ensemble learning

A technology integrating learning and enterprise data, applied in the field of information processing, can solve problems such as poor accuracy, stability and robustness, and achieve the effect of solving accuracy, practicability and convenient calculation

Pending Publication Date: 2022-01-11
GUANGDONG UNIV OF TECH
View PDF1 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] In order to solve the problems of poor accuracy, stability and robustness of the results analyzed by the above prior art, the present invention provides an enterprise data analysis method based on clustering ensemble learning, which has the advantages of accuracy, stability and Robustness, convenient calculation, and strong practicability

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Enterprise data analysis method based on clustering ensemble learning
  • Enterprise data analysis method based on clustering ensemble learning
  • Enterprise data analysis method based on clustering ensemble learning

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0052] Such as figure 1 As shown, an enterprise data analysis method based on clustering ensemble learning includes the following steps:

[0053] S1. Obtain the data of the industry to be analyzed, and find out the main enterprises to be analyzed in the industry to be analyzed; this embodiment uses the relatively popular python library beautifulSoup, urllib and other libraries to crawl the information related to the marine enterprises in Guangdong Province on the Internet The marine industry in Guangdong Province was analyzed based on data such as operating areas and registered capital.

[0054] S2. Crawl the relevant data of the enterprise to be analyzed;

[0055] S3. Preprocessing the crawled data, and sorting the processed data into a data set;

[0056] S4. Using KMeans as the base clusterer, clustering ensemble learning is performed on the data set, and the basic clustering result is obtained;

[0057] S5. Using the basic clustering results to construct a joint matrix; ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an enterprise data analysis method based on clustering ensemble learning, and the method comprises the following steps: S1, obtaining the data of a to-be-analyzed industry, and finding out a plurality of main to-be-analyzed enterprises in the to-be-analyzed industry, where the embodiment analyzes the marine industry in Guangzhou regions; S2, crawling related data of an enterprise to be analyzed; s3, preprocessing the crawled data, and arranging the preprocessed data into a data set; s4, performing clustering ensemble learning on the data set by adopting KMeans as a base clustering device to obtain a basic clustering result; s5, constructing a joint matrix by using a basic clustering result; and S6, processing the joint matrix by adopting single-link hierarchical clustering to obtain a final clustering integration result of the to-be-analyzed enterprise. The invention has the characteristics of accuracy, stability, robustness, convenience in calculation and high practicability.

Description

technical field [0001] The present invention relates to the technical field of information processing, and more specifically, relates to an enterprise data analysis method based on clustering ensemble learning. Background technique [0002] Data clustering or unsupervised learning is an important but extremely difficult problem among current approaches to enterprise data analysis. Organizing or discovering structure in data by dividing a set of unlabeled objects into homogeneous groups or clusters, such as data mining, information retrieval image segmentation, and machine learning. In real-world problems, clusters may appear in different shapes, sizes, data sparsity, and degrees of separation. Furthermore, noise in the data can mask the real underlying structure present in the data. Clustering techniques require defining a similarity measure between patterns, which is not easily specified without any prior knowledge about the cluster shapes. Clustering integration is an a...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/906G06V10/762
CPCG06F16/906G06F18/231G06F18/23213
Inventor 程良伦郑达成张伟文陈武兴
Owner GUANGDONG UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products