Patents
Literature
Patsnap Copilot is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Patsnap Copilot

53 results about "Majority class" patented technology

Majority Class: Accuracy will be equal to \((1-x_i)\), the fraction of instances belonging to the majority class (assumed negative label is majority here).

Unbalanced data sampling method in improved C4.5 decision tree algorithm

The invention relates to an unbalanced data sampling method in an improved C4.5 decision tree algorithm. The method comprises the steps as follows: firstly, initial weights of various samples are determined according to the number of various samples; the weights of the samples are modified through the training result of the improved C4.5 decision tree algorithm in each round; the information gain ratio and misclassified sample weights are taken into account by a division standard of the improved C4.5 algorithm; the final weights of the samples are obtained after T iterations; the samples in minority class boundary regions and majority class center regions are found out according to the sample weights; over-sampling is carried out on the samples in the minority class boundary regions by an SMOTE algorithm; and under-sampling is carried out on majority class samples by a weight sampling method, so that the samples in the center regions are relatively easily selected to improve the balance degree of different classes of data, and the recognition rates of the minority class and the overall data set are improved. According to the unbalanced data sampling method in the improved C4.5 decision tree algorithm, weight modification is carried out through the improved C4.5 decision tree algorithm; and over-sampling and under-sampling are specifically carried out according to the sample weights, so that the phenomena of classifier over-fitting, loss of useful information of the majority class and the like are effectively avoided.
Owner:CHONGQING UNIV OF POSTS & TELECOMM

Weight clustering and under-sampling-based unbalanced data classification method

InactiveCN106778853ATo achieve the effect of automatic clusteringImprove classification accuracyCharacter and pattern recognitionData dredgingMajority class
The classification of unbalanced data sets already becomes one of most challenging problems in data mining. A quantity of minority class samples is far smaller than a quantity of majority class samples, so that the minority classes have the defects of low accuracy, poor generalization performance and the like in a classification learning process of a conventional algorithm. The algorithm integration already becomes an important method for dealing with the problem, wherein random under-sampling-based and clustering-based integrated algorithms can effectively improve classification performance. But, the former easily causes information loss, and the latter is complex in calculation and difficult to popularize. The invention provides a weight clustering-based improved integrated classification algorithm fusing under-sampling, which is specifically a weight clustering and under-sampling-based unbalanced data classification method. According to the algorithm, a cluster is divided according to weights of the samples, a certain proportion of majority classes and all minority classes are extracted from each cluster according to weight values of the samples to form a balanced data set, and classifiers are integrated by utilizing an Adaboost algorithm framework, so that the classification effect is improved. An experimental result shows that the algorithm has the characteristics of accuracy, simplicity and high stability.
Owner:CENT SOUTH UNIV

Software defect prediction optimization method based on differential evolution algorithm

The present invention discloses a software defect prediction optimization method based on a differential evolution algorithm, and belongs to the field of quality assurance in the software engineering.The method comprises the following steps: arranging modules in the software project, cleaning annotations and the like in the code, and establishing a software defect data code set; arranging the given defect set, including the defect metric design, the defect data marks, and the like, to generate a software defect data set; and with a differential evolution algorithm, creating a ratio of a majority class to a minority class as 2:1 for a defect prediction data set by using a minority class oversampling method, determining an optimal value of the neural network hyper-parameter, using a trainedneural network classification model to test in a test set, and if the performance indicators are satisfied, representing that a software defect prediction model is successfully established. Accordingto the method disclosed by the present invention, corresponding parameter factors in the classification model construction can be automatically classified according to the difference of the data sets, a parameter combination most suitable for the current data set and the classification model can be found, the performance of the software defect prediction model can be improved, and the workload ofparameter searching in the model construction can be reduced.
Owner:IANGSU COLLEGE OF ENG & TECH

Automatic incident detection method based on under-sampling and used for unbalanced data set

The invention discloses an automatic incident detection method based on under-sampling and used for an unbalanced data set. The automatic incident detection method comprises the steps of (1) using a maximum and minimum normalization method to carry out normalization processing on actually-measured traffic flow data, carrying out under-sampling processing on a majority class in a training set on the basis of a neighborhood cleaning rule to obtain a new training set which is relatively balanced, (2) selecting a radial basis function as a kernel function of a support vector machine, using an improved grid search algorithm to optimize a penalty factor C and a kernel parameter g of the support vector machine, and (3) training the support vector machine through the training set which is relatively balanced so as to obtain an automatic incident detection model used for the unbalanced data set. According to the automatic incident detection method based on under-sampling and used for the unbalanced data set, the problem that an existing traffic incident detection algorithm is not applicable to unbalanced traffic data in reality is solved, detection performance of the traffic incident detection algorithm is remarkably improved, the average detection time is shortened, and the requirement of traffic incident detection for real-time performance is met.
Owner:SOUTHEAST UNIV

Multi-target evolutionary fuzzy rule classification method based on decomposition

The invention discloses a multi-target evolutionary fuzzy rule classification method based on decomposition, which mainly solves the problem of poor classification effect of an existing classification method on unbalanced data. The multi-target evolutionary fuzzy rule classification method comprises the steps of: obtaining a training data set and a test data set; normalizing and dividing the training data set into a majority class and a minority class; initializing an ignoring probability, a fuzzy partition number and a membership degree function; initializing an original group, and determining weight by adopting a fuzzy rule weight formula with a weighting factor; determining stopping criteria for iteration, iteration times, a step size and an ideal point; dividing direction vectors according to groups; performing evolutionary operation on the original group, and updating the original group by adopting a Chebyshev update mode until the criteria for iteration is stopped; obtaining classification results of the test data set; then projecting to obtain AUCH and output. The multi-target evolutionary fuzzy rule classification method has the advantages of high operating speed and good classification effect and can be applied in the technical fields of tumor detection, error detection, credit card fraud detection, spam messages recognition and the like.
Owner:XIDIAN UNIV

A multi-classification method based on adaptive balanced integration and dynamic hierarchical decision-making

InactiveCN109359704AReduce dependenceSolve the problem of unbalanced number of positive and negative samplesCharacter and pattern recognitionMajority classData set
The embodiment of the invention provides a multi-classification method based on adaptive balance integration and dynamic hierarchical decision, which includes converting the original data set into a plurality of second-class data sets according to one-to-many decomposition strategy, taking the number of the majority class samples and the minority class samples in each second-class data set as theupper and lower limits of the parameter interval respectively, taking the average accuracy rate of each class as the scoring standard, and obtaining the sampling number of each subset by grid searching method; Based on this, the over-sampling and under-sampling techniques are combined to balance the two kinds of data sets to establish a plurality of binary classification sub-models, and the binaryclassification model is obtained by integrating the sub-models through the averaging method. According to the output results of all the binary classification models, the spatial position informationof the test samples is obtained under the one-to-many framework, and the classification strategies for the blank area, the intersecting area and the normal area are established to determine the finalcategory of the test samples. The technical proposal provided by the embodiment of the invention can improve the overall recognition rate of the classification model for each category under the one-to-many framework.
Owner:BEIJING UNIV OF POSTS & TELECOMM

A multi-classification oriented unbalanced data preprocessing method and device and an apparatus

InactiveCN109033148AImprove classification accuracyResolve problems that arise when conflicts ariseSpecial data processing applicationsMajority classAlgorithm
The invention discloses a multi-classification oriented unbalanced data preprocessing method and device and an apparatus. The method comprises the following steps: receiving the final sample set sizeand the unbalanced ratio of the sample set, and obtaining the ideal sample number of each class; according to the number of ideal samples and the number of actual samples, judging the sample sets of minority classes and majority classes; for the samples in the sample set of a few classes, calculating the number of other class samples and a few class samples in the k-nearest neighbor to classify the samples; for the sample set of a few classes, performing deleting, saving, copying or synthesizing according to the marker of the sample set to obtain the final sample set of a few classes; For thesamples in most of the sample sets, calculating the number of the samples in the k-nearest neighbors and other samples to classify the samples. The samples in most class sample sets are deleted or saved according to the markers of the samples to obtain the final sample sets of most classes. The final sample set is generated. The invention enables the final sample set to effectively improve the accuracy of the multi-classification algorithm.
Owner:GUANGZHOU UNIVERSITY

Method for extracting sensitive data from unbalanced data based on SVM-forest

ActiveCN107728476AReduce imbalanceClassification effect balanceAdaptive controlMajority classTest sample
The invention discloses a method for extracting sensitive data from unbalanced data based on SVM forest. The method comprises the steps that a part of labeled samples are taken as test samples, and the rest of the samples are used as training samples; k-Means is used to divide a normal working condition class into subclasses, and the subclasses are mixed with fault working condition type data to form N training subsets; an SVM-tree method is used to train SVM-Forest, and the test samples are used to test the SVM-forest; L trees with the highest fault working condition misclassification rate are selected; some data with a great influence on the classification effect are kept; according to a selection classification algorithm, a classifier T is trained through the minority classes and the remaining majority classes in a test set; and a temporary test sample is used to test the classification effect of T until the effect meets requirements. According to the sensitive data extracting method provided by the invention, samples with a great influence on the classification effect in a majority of sample sets are selected through multiple iterations to reduce the degree of unbalance; and the classification effect is close to or up to an equal classification effect under the same condition.
Owner:ZHEJIANG UNIV

SAMME.RCW algorithm based face recognition optimization method

The invention relates to a SAMME.RCW algorithm based face recognition optimization method, which comprises the steps of firstly carrying out feature extraction on a face image, and carrying out recognition classification by using an image feature vector according to a SAMME.RCW algorithm. Modification is carried out on a weight adjustment process of the SAMME.RCW algorithm, thereby ensuring the weight of every class of samples not to be too small when re-sampling occurs, also enabling weight adjustment after re-sampling to be more partial to minority-class samples, and ensuring classification effects of the samples. A requirement of the SAMME.RCW algorithm for the performance of a weak classifier is that the weight of correctly classified samples in each class is greater than the weight of any other class of samples, and a requirement for the accuracy is performed on each class independently. Through modification carried out on weight allocation in re-sampling, the probability of being selected of each class of samples is ensured to be basically the same, and classification effects of the minority-class samples and majority-class samples in the weak classifier are ensured at the same time. The accuracy of face recognition is effectively improved by a finally acquired strong classifier.
Owner:BEIJING UNIV OF TECH

Text feature selection method based on unbalanced data sets

The invention relates to a text feature selection method based on unbalanced data sets. Feature sets of unbalanced documents are calculated on a computer; and modelling is carried out by selecting a classification algorithm model. The text feature selection method specifically comprises the following steps of: (1), dividing the data sets into majority classes and minority classes, stipulating the minority classes as positive classes represented by ci, and stipulating the majority classes as negative classes represented by a formula shown in the specification; (2), pre-processing texts in the data sets, and executing operations, such as word segmentation and removing of stop words, so as to form a set T of features t; (3), respectively calculating parameters A, B, C, D and N corresponding to various features t in the unbalanced class documents; (4), respectively calculating new X2(t,ci) of various features t under different classes in the unbalanced class documents; (5), respectively setting threshold values for screening features in the unbalanced class documents, according to the X2(t,ci) calculated by various features, arranging according to the size order; and taking out a feature set T' including an appointed number of features according to the classes; and (6), selecting a proper classification algorithm model (such as a decision tree, a support vector machine and Bayes) to model according to the feature set T' after the features are selected.
Owner:ZHEJIANG UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products