Patents
Literature
Patsnap Copilot is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Patsnap Copilot

143 results about "Minority class" patented technology

Definition of Minority Class A Members. Minority Class A Members means each of the Common Members who purchased Class A Units in the Offering (excluding any of the Initial Members) and who are listed on Exhibit A annexed hereto.

Unbalanced data classification method based on unbalanced classification indexes and integrated learning

The invention discloses an unbalanced data classification method based on unbalanced classification indexes and integrated learning, and mainly solves the problem of low classification accuracy of the minority class of the unbalanced data in the prior art. The method comprises steps as follows: (1), a training set and a testing set are selected; (2), training sample weight is initialized; (3), part of training samples is selected according to the training sample weight for training a weak classifier, and the well trained weak classifier is used for classifying all training samples; (4), the classification error rate of the weak classifier on the training set is calculated, is compared with a set threshold value and is optimized; (5), voting weight of the weak classifier is calculated according to the error rate, and the training sample weight is updated; (6), whether the training of the weak classifier reaches the maximum number of iterations is judged, if the training of the weak classifier reaches the maximum number of iterations, a strong classifier is calculated according to the weak classifier and the voting weight of the weak classifier, and otherwise, the operation returns to the step (3). The classification accuracy of the minority class is improved, and the method can be applied to classification of the unbalanced data.
Owner:XIDIAN UNIV

Unbalanced data sampling method in improved C4.5 decision tree algorithm

The invention relates to an unbalanced data sampling method in an improved C4.5 decision tree algorithm. The method comprises the steps as follows: firstly, initial weights of various samples are determined according to the number of various samples; the weights of the samples are modified through the training result of the improved C4.5 decision tree algorithm in each round; the information gain ratio and misclassified sample weights are taken into account by a division standard of the improved C4.5 algorithm; the final weights of the samples are obtained after T iterations; the samples in minority class boundary regions and majority class center regions are found out according to the sample weights; over-sampling is carried out on the samples in the minority class boundary regions by an SMOTE algorithm; and under-sampling is carried out on majority class samples by a weight sampling method, so that the samples in the center regions are relatively easily selected to improve the balance degree of different classes of data, and the recognition rates of the minority class and the overall data set are improved. According to the unbalanced data sampling method in the improved C4.5 decision tree algorithm, weight modification is carried out through the improved C4.5 decision tree algorithm; and over-sampling and under-sampling are specifically carried out according to the sample weights, so that the phenomena of classifier over-fitting, loss of useful information of the majority class and the like are effectively avoided.
Owner:CHONGQING UNIV OF POSTS & TELECOMM

Selective up-sampling combined method for weighted ensemble classification prediction of unbalanced data flows

The invention relates to the technical field of data mining, and discloses a selective up-sampling combined method for weighted ensemble classification prediction of unbalanced data flows. The method comprises the following steps of: screening minority class samples of history data blocks according to a similarity, and selecting samples closest to the current training data block in the aspect of concept; synthesizing the selected samples into new samples in a decision boundary area so as to selectively implement up-sampling; and carrying out weighted ensemble classification on the new sample by adoption of a probability distribution relevancy-based weight distribution strategy. According to the method, the minority class sample information is effectively increased through selecting history data with high similarities and synthesizing new data at the boundary area, so that the decision domain of the minority class is enlarged; and meanwhile, in order to adapt the dynamic data with concept drift and use an ensemble classification thought, the probability distribution relevancy-based weight distribution strategy is designed, so that the overall classification precision is enhanced. Experiment results show that the method is capable of effectively improving the minority class identification rate and the overall classification performance, and has the advantage of better processing the unbalanced data flows.
Owner:NORTHEASTERN UNIV

Weight clustering and under-sampling-based unbalanced data classification method

InactiveCN106778853ATo achieve the effect of automatic clusteringImprove classification accuracyCharacter and pattern recognitionData dredgingMajority class
The classification of unbalanced data sets already becomes one of most challenging problems in data mining. A quantity of minority class samples is far smaller than a quantity of majority class samples, so that the minority classes have the defects of low accuracy, poor generalization performance and the like in a classification learning process of a conventional algorithm. The algorithm integration already becomes an important method for dealing with the problem, wherein random under-sampling-based and clustering-based integrated algorithms can effectively improve classification performance. But, the former easily causes information loss, and the latter is complex in calculation and difficult to popularize. The invention provides a weight clustering-based improved integrated classification algorithm fusing under-sampling, which is specifically a weight clustering and under-sampling-based unbalanced data classification method. According to the algorithm, a cluster is divided according to weights of the samples, a certain proportion of majority classes and all minority classes are extracted from each cluster according to weight values of the samples to form a balanced data set, and classifiers are integrated by utilizing an Adaboost algorithm framework, so that the classification effect is improved. An experimental result shows that the algorithm has the characteristics of accuracy, simplicity and high stability.
Owner:CENT SOUTH UNIV

Prediction method for unbalanced data set based on isolated forest learning

The invention discloses a prediction method for an unbalanced data set based on isolated forest learning. The prediction method comprises the following steps: receiving a prediction request; collecting data, and defining features and labels in the data set and the number of minority class samples and majority class samples; converting a non-numerical feature column and a label column in the data set into classification numerical values; synthesizing minority class samples by using a majority class weighted minority class oversampling technology to form a balance data set; performing abnormal point identification and removal on the balance data set by using an isolated forest algorithm; then performing data standardization, and dividing a training set and a test set; constructing and training a support vector machine classifier model by using the training set; adjusting hyper-parameters of the support vector machine classifier model through a genetic algorithm, and obtaining a prediction model after training is completed; and inputting the test set into the prediction model to obtain a prediction result. The prediction method for the unbalanced data set based on isolated forest learning has the characteristics of stable prediction result and high prediction precision.
Owner:XIAN UNIV OF TECH

Data classification method, device, electronic device and computer readable medium

The disclosure provides a data classification method, device, electronic device and computer readable medium. The data classification method includes adopting a machine learning method to perform modeling on full training data to obtain an original model, wherein the full training data contains minority class samples; performing screening to obtain new trained data from the full training data based on a minority class proportion threshold value which is a critical value of the proportion of the minority class samples in the full training data; adopting the machine learning method to perform modeling on the new trained data to obtain a new trained model; applying the original model and the new trained model to perform classification forecasting on the new trained data to obtain an original classification result and a new trained classification result; and comparing the accuracy rates of the original classification result and the new trained classification result, and using the one with a higher accuracy rate as a final classification result. The model is retrained aiming at the new trained data with an improved minority class sample proportion, and an original model result is updated, thereby achieving the purpose of improving the accuracy rate of sample classification.
Owner:JINGDONG TECH HLDG CO LTD

Software defect prediction optimization method based on differential evolution algorithm

The present invention discloses a software defect prediction optimization method based on a differential evolution algorithm, and belongs to the field of quality assurance in the software engineering.The method comprises the following steps: arranging modules in the software project, cleaning annotations and the like in the code, and establishing a software defect data code set; arranging the given defect set, including the defect metric design, the defect data marks, and the like, to generate a software defect data set; and with a differential evolution algorithm, creating a ratio of a majority class to a minority class as 2:1 for a defect prediction data set by using a minority class oversampling method, determining an optimal value of the neural network hyper-parameter, using a trainedneural network classification model to test in a test set, and if the performance indicators are satisfied, representing that a software defect prediction model is successfully established. Accordingto the method disclosed by the present invention, corresponding parameter factors in the classification model construction can be automatically classified according to the difference of the data sets, a parameter combination most suitable for the current data set and the classification model can be found, the performance of the software defect prediction model can be improved, and the workload ofparameter searching in the model construction can be reduced.
Owner:IANGSU COLLEGE OF ENG & TECH

Improved sewage treatment fault diagnosis method integrating weighted extreme learning machine

The invention discloses an improved sewage treatment fault diagnosis method integrating a weighted extreme learning machine. The improved sewage treatment fault diagnosis method comprises the following steps: S1, aiming at a basic classifier, carrying out assignment on an initial weight of the weighted extreme learning machine by adopting an assignment formula inclined to minority class samples; S2, training the basic classifier; S3, providing a novel integrated algorithm basic classifier weight to update the formula; taking the weighted extreme learning machine as the basic classifier; integrating a plurality of basic classifiers by adopting an Adaboost iteration method; establishing an improved sewage treatment fault diagnosis model; S4, inputting sample data generated in a sewage treatment process and setting the quantity T of the basic classifiers of an integrated algorithm, the optimal kernel bandwidth gamma of the basic classifier and a corresponding optimized regularization coefficient C; establishing a fault diagnosis model of a sewage treatment system to carry out performance test. According to the improved sewage treatment fault diagnosis method disclosed by the invention, the classification of unbalanced data of a plurality of types can be realized and the classification performance of the unbalanced data, especially the classification accuracy of minority classes, is improved; the accuracy of fault diagnosis in a sewage treatment process is effectively improved.
Owner:SOUTH CHINA UNIV OF TECH

Transformer fault detecting method based on simplified set unbalanced SVM (support vector machine)

Disclosed is a transformer fault detecting method based on a simplified set unbalanced SVM. The method comprises (1) obtaining a characteristic vector set through a fault characteristic extracting method based on GARCH models; (2) performing determination of boundary samples on minority-class samples to obtain a minority-class boundary sample set S, wherein the minority-class samples are fault samples; randomly selecting N[x]={2, , ISI}, wherein ISI is the cardinal number of S, a[i]=1, and i=1, N[x], and setting N[z] to be 1, utilizing a simplified set solution algorithm to obtain Z[1] and repeating the operation for N[L]-N[M] times, wherein N[L] is the number of majority samples, N[M] is the number of minority samples, and accordingly, N[L]-N[M] is the number of artificial minority samples, and guaranteeing N[z]=ISI for at least one once; (3) combining the artificial minority samples obtained in the step (2) with original minority samples to serve as the training samples of an SVM classifier and lastly to obtain an SVM decision model; (4) inputting newly-obtained transformer characteristic vectors into the decision model for judgment. The transformer fault detecting method based on the simplified set unbalanced SVM is applied to transformer fault detection.
Owner:STATE GRID CORP OF CHINA +2

Unbalanced ensemble classification method based on data partition hybrid sampling

The embodiment of the invention provides an unbalanced ensemble classification method based on data partition mixed sampling. The method comprises the following steps: dividing a sample space into four regions according to majority class proportions in minority class neighborhoods; generating a weight according to the ratio of the majority class ratio of each minority class neighborhood to the sumof the majority class ratios, the minority class safety regions, the boundary regions and the minority class noise regions, determining the synthesis number of each minority class neighborhood according to the weight, and performing oversampling on the minority classes of the boundary regions in a random linear interpolation mode; random under-sampling is carried out on the majority class of safety regions, a few class of noise region samples are removed, a few class of safety region samples are reserved, and a balance data set is generated; and constructing three ensemble learning models: anoriginal model biased to majority classes, a local domain reinforcement and weakening model and a hybrid model biased to peripheral boundaries, and adaptively selecting a corresponding model according to the unbalance degree of test point neighbors placed in an original data set.
Owner:BEIJING UNIV OF POSTS & TELECOMM

Multi-target evolutionary fuzzy rule classification method based on decomposition

The invention discloses a multi-target evolutionary fuzzy rule classification method based on decomposition, which mainly solves the problem of poor classification effect of an existing classification method on unbalanced data. The multi-target evolutionary fuzzy rule classification method comprises the steps of: obtaining a training data set and a test data set; normalizing and dividing the training data set into a majority class and a minority class; initializing an ignoring probability, a fuzzy partition number and a membership degree function; initializing an original group, and determining weight by adopting a fuzzy rule weight formula with a weighting factor; determining stopping criteria for iteration, iteration times, a step size and an ideal point; dividing direction vectors according to groups; performing evolutionary operation on the original group, and updating the original group by adopting a Chebyshev update mode until the criteria for iteration is stopped; obtaining classification results of the test data set; then projecting to obtain AUCH and output. The multi-target evolutionary fuzzy rule classification method has the advantages of high operating speed and good classification effect and can be applied in the technical fields of tumor detection, error detection, credit card fraud detection, spam messages recognition and the like.
Owner:XIDIAN UNIV

Unbalanced data set-oriented extreme learning machine based transformer fault diagnosis method

The invention discloses an unbalanced data set-oriented extreme learning machine based transformer fault diagnosis method. The method specifically comprises the following steps: step 1, dividing a collected unbalanced sample set S={(x1, t1), (x2, t2)...(xn, tn)} with class labels of an oil-immersed transformer into training samples and test samples by a ratio of 6:1, wherein xi represents a sampleproperty, i may be equal to 1, 2, 3, 4, 5, 6, and specifically comprises six attributes of hydrogen, methane, ethane, ethylene, acetylene, carbon monoxide, ti represents a class label, i may be equalto 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6 respectively corresponding to normal state, middle temperature overheat, high temperature overheat, partial discharge, spark discharge, arc discharge, and the tiis clustered by a PAM algorithm; step 2, for minority classes, taking the cluster center of the PAM algorithm as a central point; and step 3, during the classification output stage of the extreme learning machine, firstly establishing a DAG-ELM model, secondly dividing a new data set generated in step 2 into training sets and test sets by the ratio of 6:1, wherein 6 parts are used for training modeling, and 1 part is used for verifying the classification effect. According to the unbalanced data set-oriented extreme learning machine based transformer fault diagnosis method, the influence of the unbalanced data set on the transformer fault diagnosis result is solved.
Owner:XI'AN POLYTECHNIC UNIVERSITY

A multi-classification method based on adaptive balanced integration and dynamic hierarchical decision-making

InactiveCN109359704AReduce dependenceSolve the problem of unbalanced number of positive and negative samplesCharacter and pattern recognitionMajority classData set
The embodiment of the invention provides a multi-classification method based on adaptive balance integration and dynamic hierarchical decision, which includes converting the original data set into a plurality of second-class data sets according to one-to-many decomposition strategy, taking the number of the majority class samples and the minority class samples in each second-class data set as theupper and lower limits of the parameter interval respectively, taking the average accuracy rate of each class as the scoring standard, and obtaining the sampling number of each subset by grid searching method; Based on this, the over-sampling and under-sampling techniques are combined to balance the two kinds of data sets to establish a plurality of binary classification sub-models, and the binaryclassification model is obtained by integrating the sub-models through the averaging method. According to the output results of all the binary classification models, the spatial position informationof the test samples is obtained under the one-to-many framework, and the classification strategies for the blank area, the intersecting area and the normal area are established to determine the finalcategory of the test samples. The technical proposal provided by the embodiment of the invention can improve the overall recognition rate of the classification model for each category under the one-to-many framework.
Owner:BEIJING UNIV OF POSTS & TELECOMM

Credit card fraud detection method and system based on undersampling, medium and equipment

The invention provides a credit card fraud detection method and system based on undersampling, a medium and equipment. The method comprises: fitting majority class samples of a training set in a dataset by using a Gaussian mixture model; predicting probability density values of minority class samples in the training set by using the fitted Gaussian mixture model, and selecting a maximum value inthe probability density values as a cross edge of the two classes of samples; taking the cross edge as a center, extending upwards and downwards from the cross edge to set a sampling upper bound and asampling lower bound so as to carry out undersampling to obtain an undersampling data set, and combining the undersampling data set with the minority class sample set to form an equalization trainingset; training a machine learning classifier according to the balanced training set; and detecting a credit card transaction data set by using the trained machine learning classifier. The Gaussian mixture model is used for grabbing the samples with the two types of samples distributed at the crossed edges, more useful information is provided for recognition of the two types of samples, and the recognition accuracy of the classifier in the field of credit card fraud detection is improved.
Owner:TONGJI UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products