Disease danger factor extracting method based on improved K-means clustering

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology of risk factors and extraction methods, which is applied in the fields of big data technology and medicine, can solve problems such as difficulty, low clustering accuracy, and large amount of calculation, and achieve high accuracy and improve accuracy

Inactive Publication Date: 2019-07-02

NANJING UNIV OF SCI & TECH

View PDF3 Cites 5 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0006] In the current clustering method, there are two shortcomings in the traditional K-means algorithm: 1) the selection of the initial point, the selection of the initial point, there are many solutions, but there is still an inevitable problem of randomization; 2) Determination of the number of clusters. Before a deep understanding of the data, it is undoubtedly a big problem to select the number of clusters. Most of the existing methods are given based on the scoring mechanism, but once this method involves big data , there will be a large amount of calculation

The Canopy algorithm has the following disadvantages: 1) The clustering accuracy is low, and the algorithm forms several canopy that intersect with each other, and there is a large error; 2) There is a randomization part, that is, each time the center point of the canopy is selected randomly 3) It is necessary to manually determine the distance thresholds T1 and T2, and it is difficult to determine the distance

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment

[0063] 1. Construct a user information matrix and label vector based on a user questionnaire for a certain disease: In this example, a breast cancer data set is used to construct a user information matrix and label vector. There are 569 cell biopsy cases in total, and each case has 30 breasts The characteristics of the nucleus displayed on the mass biopsy image, the answer is a numerical index, including the nucleus radius (radius), texture (texture), perimeter (perimeter), area (area), smoothness (smoothness), concavity (concavity) ), symmetry, compactness, concave points, mean, standard deviation and maximum value of fractal dimension. The size of the user data information matrix is 569*31, and the first column represents the unique identification number of the case. The problem features in this embodiment are specifically: ['mean radius', 'meantexture', 'mean perimeter', 'mean area', 'mean smoothness', 'mean compactness', 'mean concavity', 'mean concave points', 'mean sy...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a disease danger factor extracting method based on improved K-means clustering. The method comprises the following steps of firstly constructing a user information matrix and alabel vector according to a user investigation questionnaire of a disease; secondly, standardizing the user information matrix; then performing characteristic selection on the standardized user information matrix, calculating an intersection by means of methods of chi-square testing and eliminating a least variance, obtaining a related characteristic and constructing a problem characteristic datamatrix; afterwards, according to an improved Canopy algorithm and a K-means algorithm, performing clustering analysis on the characteristic attribute in a problem characteristic data matrix, and obtaining different class clusters; and finally, performing correlation coefficient analysis on each class cluster, selecting the characteristic with highest correlation coefficient as the representativecharacteristic of the class cluster, and adding the characteristic into a danger factor set. Relative to a frequency theory method in the danger factor extracting method in existing medicine field, the method is advantageous in that the disease danger factor can more efficiently and accurately extracted.

Description

technical field [0001] The invention relates to the fields of big data technology and medicine, in particular to a disease risk factor extraction method based on improved K-means clustering. Background technique [0002] Gastroesophageal reflux disease refers to a disease in which gastric reflux flows back into the esophagus, causing discomfort symptoms and complications. As a common clinical disease of the digestive system, it generally exists in various Asian and Western countries, and its incidence is increasing year by year. high trend. According to research, gastroesophageal reflux disease is related to various factors such as personal life, eating habits, and mental status, and the condition is prone to change. Therefore, exploring the risk factors of gastroesophageal reflux disease through big data technology is of great significance for the treatment and prevention of the disease. [0003] At present, for the risk factors of gastroesophageal reflux and other diseas...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G16H50/70G06K9/62

CPCG16H50/70G06F18/23213

Inventor 徐雷姚澜

Owner NANJING UNIV OF SCI & TECH

Disease danger factor extracting method based on improved K-means clustering

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology