Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Similarity measurement method based on attribute selection

A similarity measurement and attribute selection technology, applied in the field of information processing, can solve the problems of high algorithm complexity and complex calculation process, and achieve the effect of good performance

Inactive Publication Date: 2018-11-13
GUANGDONG POWER GRID CO LTD +1
View PDF3 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] In order to overcome at least one defect described in the above-mentioned prior art, the present invention provides a similarity measurement method based on attribute selection, which considers the importance of attributes when establishing a partition forest, and overcomes the algorithm complexity when processing high-dimensional data It has better performance than other algorithms in outlier detection, and can effectively deal with high-dimensional data.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Similarity measurement method based on attribute selection
  • Similarity measurement method based on attribute selection
  • Similarity measurement method based on attribute selection

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0059] The accompanying drawings are for illustrative purposes only, and should not be construed as limitations on this patent; in order to better illustrate this embodiment, certain components in the accompanying drawings will be omitted, enlarged or reduced, and do not represent the size of the actual product; for those skilled in the art It is understandable that some well-known structures and descriptions thereof may be omitted in the drawings. The positional relationship described in the drawings is for illustrative purposes only, and should not be construed as a limitation on this patent.

[0060] Definition 1: Express the information system S as: S=(U,C,V,f), where U={x 1 ,x 2 ,...,x n} is the instance set, C={c 1 ,c 2 ,...,c n} is the attribute set, V is the value set of C, and f:U×C→V is the mapping function.

[0061] Definition 2: Any subset B in C determines the indistinguishability relation IND(B) on U. IND(B) is defined as follows: if and only if for any b∈...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to the technical field of information processing, and in particular relates to a similarity measurement method based on attribute selection. The method comprises the steps of a data preparation stage: preprocessing initial data and performing discretization on continuous data sets; a random forest building stage comprising attribute selection and instance division, selectingan attribute with the maximum conspicuousness in the attribute set to partition instances of data sets, and then iteratively building a partition forest comprising m decision making trees; and a similarity computing stage: computing the size of similarity between any x and y instance sets according to the M decision making trees of the partition forest. Significance of the attribute is consideredwhen the partition forest is built, the problem that the algorithm is relatively high in complexity and complex in computing process when high-dimensional data is processed is overcome, and compared with the other algorithms, the method has better performance when in outlier detection, and the high-dimensional data can be effectively processed.

Description

technical field [0001] The present invention relates to the technical field of information processing, and more specifically, to a similarity measurement method based on attribute selection. Background technique [0002] In recent years, with the rapid development of information technology, high-dimensional data has been largely generated. It is necessary to be able to process high-dimensional data using data mining techniques to extract hidden valuable information. At present, the massive business data generated by business application systems in the power industry has been defined by the enterprise management as the height of data assets, and the management of data assets has been assigned to a dedicated management department. Judging from the current management effects and usage conditions, the current data quality situation in the power industry is not very optimistic. Analyzing the reasons, the main reason is that the management method is too technical, or the managem...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30G06K9/62
CPCG06F18/2323
Inventor 曾瑛李星南付佳佳何杰李溢杰苏卓
Owner GUANGDONG POWER GRID CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products