Patents
Literature
Hiro is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Hiro

143 results about "Minority class" patented technology

Definition of Minority Class A Members. Minority Class A Members means each of the Common Members who purchased Class A Units in the Offering (excluding any of the Initial Members) and who are listed on Exhibit A annexed hereto.

Technique for classifying data

Provided is a system that generates models for classifying input data into a plurality of classes on the basis of training data previously classified into the plurality of classes. The system includes a sampling unit and a learning unit. The sampling unit samples, from the training data, a plurality of datasets each including a predetermined number of elements classified into a minority class and a corresponding number of elements classified into a majority class, the corresponding number being determined in accordance with the predetermined number. The learning unit learns each of a plurality of models for classifying the input data into the plurality of classes, by using a machine learning technique on the basis of each of the plurality of sampled datasets.
Owner:IBM CORP

Unbalanced data classification method based on unbalanced classification indexes and integrated learning

The invention discloses an unbalanced data classification method based on unbalanced classification indexes and integrated learning, and mainly solves the problem of low classification accuracy of the minority class of the unbalanced data in the prior art. The method comprises steps as follows: (1), a training set and a testing set are selected; (2), training sample weight is initialized; (3), part of training samples is selected according to the training sample weight for training a weak classifier, and the well trained weak classifier is used for classifying all training samples; (4), the classification error rate of the weak classifier on the training set is calculated, is compared with a set threshold value and is optimized; (5), voting weight of the weak classifier is calculated according to the error rate, and the training sample weight is updated; (6), whether the training of the weak classifier reaches the maximum number of iterations is judged, if the training of the weak classifier reaches the maximum number of iterations, a strong classifier is calculated according to the weak classifier and the voting weight of the weak classifier, and otherwise, the operation returns to the step (3). The classification accuracy of the minority class is improved, and the method can be applied to classification of the unbalanced data.
Owner:XIDIAN UNIV

Unbalanced data sampling method in improved C4.5 decision tree algorithm

The invention relates to an unbalanced data sampling method in an improved C4.5 decision tree algorithm. The method comprises the steps as follows: firstly, initial weights of various samples are determined according to the number of various samples; the weights of the samples are modified through the training result of the improved C4.5 decision tree algorithm in each round; the information gain ratio and misclassified sample weights are taken into account by a division standard of the improved C4.5 algorithm; the final weights of the samples are obtained after T iterations; the samples in minority class boundary regions and majority class center regions are found out according to the sample weights; over-sampling is carried out on the samples in the minority class boundary regions by an SMOTE algorithm; and under-sampling is carried out on majority class samples by a weight sampling method, so that the samples in the center regions are relatively easily selected to improve the balance degree of different classes of data, and the recognition rates of the minority class and the overall data set are improved. According to the unbalanced data sampling method in the improved C4.5 decision tree algorithm, weight modification is carried out through the improved C4.5 decision tree algorithm; and over-sampling and under-sampling are specifically carried out according to the sample weights, so that the phenomena of classifier over-fitting, loss of useful information of the majority class and the like are effectively avoided.
Owner:CHONGQING UNIV OF POSTS & TELECOMM

Image anomaly detection method in combination with CNN migration learning and SVDD

The invention discloses an image anomaly detection method in combination with CNN migration learning and SVDD. The method comprises the steps of manually capturing images around a to-be-detected imageobject according to video data, making a to-be-detected pillar number data set, expressing image data depth features by utilizing a CNN, fully extracting features of pillar number samples through pre-trained weight and parameter network models, and solving the problem of minority class data in unbalanced data; and constructing a positive sample feature set which needs to participate in training in a classifier, finally performing parameter optimization by utilizing an SVDD algorithm, grid search and the like, forming a normal domain of positive sample feature training, and realizing identification of a number state of a contact network through a boundary. The automated processing level is relatively high, so that the workload of operators can be greatly reduced; and the problem of pillarnumber anomaly of the contact network is discovered early, so that the inspection efficiency is improved.
Owner:SOUTHWEST JIAOTONG UNIV

Selective up-sampling combined method for weighted ensemble classification prediction of unbalanced data flows

The invention relates to the technical field of data mining, and discloses a selective up-sampling combined method for weighted ensemble classification prediction of unbalanced data flows. The method comprises the following steps of: screening minority class samples of history data blocks according to a similarity, and selecting samples closest to the current training data block in the aspect of concept; synthesizing the selected samples into new samples in a decision boundary area so as to selectively implement up-sampling; and carrying out weighted ensemble classification on the new sample by adoption of a probability distribution relevancy-based weight distribution strategy. According to the method, the minority class sample information is effectively increased through selecting history data with high similarities and synthesizing new data at the boundary area, so that the decision domain of the minority class is enlarged; and meanwhile, in order to adapt the dynamic data with concept drift and use an ensemble classification thought, the probability distribution relevancy-based weight distribution strategy is designed, so that the overall classification precision is enhanced. Experiment results show that the method is capable of effectively improving the minority class identification rate and the overall classification performance, and has the advantage of better processing the unbalanced data flows.
Owner:NORTHEASTERN UNIV

Weight clustering and under-sampling-based unbalanced data classification method

InactiveCN106778853ATo achieve the effect of automatic clusteringImprove classification accuracyCharacter and pattern recognitionData dredgingMajority class
The classification of unbalanced data sets already becomes one of most challenging problems in data mining. A quantity of minority class samples is far smaller than a quantity of majority class samples, so that the minority classes have the defects of low accuracy, poor generalization performance and the like in a classification learning process of a conventional algorithm. The algorithm integration already becomes an important method for dealing with the problem, wherein random under-sampling-based and clustering-based integrated algorithms can effectively improve classification performance. But, the former easily causes information loss, and the latter is complex in calculation and difficult to popularize. The invention provides a weight clustering-based improved integrated classification algorithm fusing under-sampling, which is specifically a weight clustering and under-sampling-based unbalanced data classification method. According to the algorithm, a cluster is divided according to weights of the samples, a certain proportion of majority classes and all minority classes are extracted from each cluster according to weight values of the samples to form a balanced data set, and classifiers are integrated by utilizing an Adaboost algorithm framework, so that the classification effect is improved. An experimental result shows that the algorithm has the characteristics of accuracy, simplicity and high stability.
Owner:CENT SOUTH UNIV

Prediction method for unbalanced data set based on isolated forest learning

The invention discloses a prediction method for an unbalanced data set based on isolated forest learning. The prediction method comprises the following steps: receiving a prediction request; collecting data, and defining features and labels in the data set and the number of minority class samples and majority class samples; converting a non-numerical feature column and a label column in the data set into classification numerical values; synthesizing minority class samples by using a majority class weighted minority class oversampling technology to form a balance data set; performing abnormal point identification and removal on the balance data set by using an isolated forest algorithm; then performing data standardization, and dividing a training set and a test set; constructing and training a support vector machine classifier model by using the training set; adjusting hyper-parameters of the support vector machine classifier model through a genetic algorithm, and obtaining a prediction model after training is completed; and inputting the test set into the prediction model to obtain a prediction result. The prediction method for the unbalanced data set based on isolated forest learning has the characteristics of stable prediction result and high prediction precision.
Owner:XIAN UNIV OF TECH

Data classification method, device, electronic device and computer readable medium

The disclosure provides a data classification method, device, electronic device and computer readable medium. The data classification method includes adopting a machine learning method to perform modeling on full training data to obtain an original model, wherein the full training data contains minority class samples; performing screening to obtain new trained data from the full training data based on a minority class proportion threshold value which is a critical value of the proportion of the minority class samples in the full training data; adopting the machine learning method to perform modeling on the new trained data to obtain a new trained model; applying the original model and the new trained model to perform classification forecasting on the new trained data to obtain an original classification result and a new trained classification result; and comparing the accuracy rates of the original classification result and the new trained classification result, and using the one with a higher accuracy rate as a final classification result. The model is retrained aiming at the new trained data with an improved minority class sample proportion, and an original model result is updated, thereby achieving the purpose of improving the accuracy rate of sample classification.
Owner:JINGDONG TECH HLDG CO LTD

Self-adaptive oversampling method based on HDBSCAN clustering

The invention discloses a self-adaptive oversampling method based on HDBSCAN clustering, and mainly solves the problem of unbalanced data classification by using complete data information in an existing method. The technology comprises the following steps: (1) inputting a training data set; (2) clustering the minority class samples in the training set to obtain different scales of clusters which are not intersected with each other; (3) calculating the number of samples needing to be synthesized in each minority class cluster; (4) adaptively synthesizing new samples according to the number of samples needing to be synthesized by each cluster to obtain a new minority class data set; (5) forming a new balanced data set by the majority class data set and the new minority class data set; and (6) training and testing the classifier by using the new balance data set. According to the technology, noise in an unbalanced data set can be effectively prevented from being generated, meanwhile, theproblem of inter-class and intra-class unbalance is solved, and a brand-new oversampling strategy is provided for unbalanced learning.
Owner:重庆信科设计有限公司 +1

Fault prediction method based on synthetic minority class oversampling and deep learning

The invention provides a fault prediction method based on synthetic minority class oversampling and deep learning. The Means method is used for clustering a few types of samples in the sample set; deleting the noise class cluster after clustering; dividing the class cluster into noise class samples in each class cluster by using a KNN method; fault samples and risk samples, deleting the noise samples; and finally, inputting a random number into each class cluster, and selecting a certain sample as an output sample according to a proportional relation between the random number and the fault class sample and the risk class sample in the class cluster;realizing oversampling of the SMOTE method ; and then increasing the number of a few types of samples through multiplication operation, so thatthe types of the samples in the finally obtained fusion sample are more balanced, and the acquired feature data are balanced, thereby facilitating model training, maximally mining the law behind thedata, and realizing a better fault prediction effect.
Owner:BEIJING AEROSPACE MEASUREMENT & CONTROL TECH

Construction method for classifier

The invention relates to a construction method for a classifier. The construction method includes the following steps that a part of majority class training samples in a training sample set are removed through an undersampling method, and a current training sample set is updated through the undersampled training sample set, wherein the training sample set comprises the majority class training samples and minority class training samples, and the classes of all the training samples in the training sample set are known; oversampling is conducted on the minority class training samples in the training sample set, and the classifier is constructed through the oversampled training sample set. According to the construction method for the classifier, noise in the training samples is removed effectively, the problem of data imbalance can be solved effectively, the accuracy rate of training sample data classification is greatly increased, the calculation amount is small, and the method is simple.
Owner:HARBIN INST OF TECH

Sampling method for unbalanced transaction data of fictitious assets

The invention discloses a sampling method for unbalanced transaction data of fictitious assets. The method includes the following steps that abnormal transaction data in fictitious asset transaction are defined as a minority class, and oversampling is carried out on samples of the minority class by means of an improved SMOTE method in order to increase the number of the samples of the minority class; normal transaction data in fictitious asset transaction are defined as a majority class, and undersampling is conducted on samples of the majority class by means of a distance-based DUS method in order to decrease the number of the samples of the majority class; a scaling factor is set to adjust the proportion of the oversampling number and the undersampling number. The sampling method for unbalanced transaction data is applied to abnormal transaction detection of the fictitious assets, the calculated amount of abnormal transaction detection can be greatly reduced, and a high accuracy rate can be reached.
Owner:NAT UNIV OF DEFENSE TECH

Software defect prediction optimization method based on differential evolution algorithm

The present invention discloses a software defect prediction optimization method based on a differential evolution algorithm, and belongs to the field of quality assurance in the software engineering.The method comprises the following steps: arranging modules in the software project, cleaning annotations and the like in the code, and establishing a software defect data code set; arranging the given defect set, including the defect metric design, the defect data marks, and the like, to generate a software defect data set; and with a differential evolution algorithm, creating a ratio of a majority class to a minority class as 2:1 for a defect prediction data set by using a minority class oversampling method, determining an optimal value of the neural network hyper-parameter, using a trainedneural network classification model to test in a test set, and if the performance indicators are satisfied, representing that a software defect prediction model is successfully established. Accordingto the method disclosed by the present invention, corresponding parameter factors in the classification model construction can be automatically classified according to the difference of the data sets, a parameter combination most suitable for the current data set and the classification model can be found, the performance of the software defect prediction model can be improved, and the workload ofparameter searching in the model construction can be reduced.
Owner:IANGSU COLLEGE OF ENG & TECH

Improved sewage treatment fault diagnosis method integrating weighted extreme learning machine

The invention discloses an improved sewage treatment fault diagnosis method integrating a weighted extreme learning machine. The improved sewage treatment fault diagnosis method comprises the following steps: S1, aiming at a basic classifier, carrying out assignment on an initial weight of the weighted extreme learning machine by adopting an assignment formula inclined to minority class samples; S2, training the basic classifier; S3, providing a novel integrated algorithm basic classifier weight to update the formula; taking the weighted extreme learning machine as the basic classifier; integrating a plurality of basic classifiers by adopting an Adaboost iteration method; establishing an improved sewage treatment fault diagnosis model; S4, inputting sample data generated in a sewage treatment process and setting the quantity T of the basic classifiers of an integrated algorithm, the optimal kernel bandwidth gamma of the basic classifier and a corresponding optimized regularization coefficient C; establishing a fault diagnosis model of a sewage treatment system to carry out performance test. According to the improved sewage treatment fault diagnosis method disclosed by the invention, the classification of unbalanced data of a plurality of types can be realized and the classification performance of the unbalanced data, especially the classification accuracy of minority classes, is improved; the accuracy of fault diagnosis in a sewage treatment process is effectively improved.
Owner:SOUTH CHINA UNIV OF TECH

Supervised image segmentation method for hyperspectral image based migration dictionary learning

The invention discloses a supervised image segmentation method for a hyperspectral image based migration dictionary learning, which mainly solves the problem of unbalance of classes in hyperspectral image segmentation. The implementation process of the method comprises the following steps of: (1) inputting a target image and an auxiliary image, and extracting features; (2) setting loop termination times, training a classifier by a dictionary learning method for a target domain labeled sample set; (3) calculating a migration sample set; (4) updating a minority class sample set in the target domain labeled sample set; (5) calculating the class labels and the classifier weight in a target domain unlabeled sample set in current loop; (6) calculating the class labels of a final target domain unlabeled sample set; (7) outputting the segmentation result of the target image by the obtained class labels of the final target domain unlabeled sample set and the labels of the target domain labeledsample set. The method has the advantage of being efficient in segmentation of the hyperspectral image with unbalanced classes, and can be used for detection and recognition of a radar target.
Owner:XIDIAN UNIV

Transformer fault detecting method based on simplified set unbalanced SVM (support vector machine)

Disclosed is a transformer fault detecting method based on a simplified set unbalanced SVM. The method comprises (1) obtaining a characteristic vector set through a fault characteristic extracting method based on GARCH models; (2) performing determination of boundary samples on minority-class samples to obtain a minority-class boundary sample set S, wherein the minority-class samples are fault samples; randomly selecting N[x]={2, , ISI}, wherein ISI is the cardinal number of S, a[i]=1, and i=1, N[x], and setting N[z] to be 1, utilizing a simplified set solution algorithm to obtain Z[1] and repeating the operation for N[L]-N[M] times, wherein N[L] is the number of majority samples, N[M] is the number of minority samples, and accordingly, N[L]-N[M] is the number of artificial minority samples, and guaranteeing N[z]=ISI for at least one once; (3) combining the artificial minority samples obtained in the step (2) with original minority samples to serve as the training samples of an SVM classifier and lastly to obtain an SVM decision model; (4) inputting newly-obtained transformer characteristic vectors into the decision model for judgment. The transformer fault detecting method based on the simplified set unbalanced SVM is applied to transformer fault detection.
Owner:STATE GRID CORP OF CHINA +2

Unbalanced ensemble classification method based on data partition hybrid sampling

The embodiment of the invention provides an unbalanced ensemble classification method based on data partition mixed sampling. The method comprises the following steps: dividing a sample space into four regions according to majority class proportions in minority class neighborhoods; generating a weight according to the ratio of the majority class ratio of each minority class neighborhood to the sumof the majority class ratios, the minority class safety regions, the boundary regions and the minority class noise regions, determining the synthesis number of each minority class neighborhood according to the weight, and performing oversampling on the minority classes of the boundary regions in a random linear interpolation mode; random under-sampling is carried out on the majority class of safety regions, a few class of noise region samples are removed, a few class of safety region samples are reserved, and a balance data set is generated; and constructing three ensemble learning models: anoriginal model biased to majority classes, a local domain reinforcement and weakening model and a hybrid model biased to peripheral boundaries, and adaptively selecting a corresponding model according to the unbalance degree of test point neighbors placed in an original data set.
Owner:BEIJING UNIV OF POSTS & TELECOMM

Extremely unbalanced data classification method based on EasyEnsemble algorithm and SMOTE algorithm

InactiveCN108596199ASolve the problem of extreme deficiencyImprove reliabilityCharacter and pattern recognitionData setMajority class
The invention provides an extremely unbalanced data classification method based on an EasyEnsemble algorithm and an SMOTE algorithm. The method comprises: a plurality of minority class subsets are constructed by using an SMOTE algorithm and minority class samples are increased; random undersampling is carried out on majority classes, and all majority class subsets and minority class subsets are combined to obtain a plurality of training subsets with a fixed sample proportion; noise reduction is carried out on each training subset; AdaBoost classifiers are trained by using the training subset after noise reduction; and then all AdaBoost classifiers are integrated to obtain a final classifier. According to the invention, a problem of shortage of minority class samples is solved; and the unbalancing state of the sample is changed by combining random undersampling. With the noise reduction technology, reliability of a new data set is improved; the classification boundary is smoothened; andmajority class information losses are reduced by using an integration method, so that the performance of the classifier is improved.
Owner:BEIJING JIAOTONG UNIV

Improved SMOTE re-sampling method for unbalanced data classification

InactiveCN107330477ASolve the problem of blindness in neighbor selectionReduce overlapCharacter and pattern recognitionAlgorithmMinority class
The invention discloses an improved SMOTE re-sampling method for unbalanced data classification. The method comprises clustering a minority class of samples in a sample set by using a K-Means method, deleting the noise sample class with a centroid closest to a majority of samples in each class cluster, classifying each class cluster into three classes by using a KNN method and removing the noise sample class, finally inputting a random number in each class cluster and selecting a certain sample set according to the proportion relation of the random number to the sample set type in the class cluster to carry out the SMOTE method oversampling. Compared with a traditional SMOTE method, the improved K-Means-SMOTE method is significantly improved in effect in a model of predicting the complaint of network television set-top box users.
Owner:NANJING UNIV OF POSTS & TELECOMM

Credit scoring model construction method and device, equipment and storage medium

The invention provides a credit scoring model construction method and a device, equipment and a storage medium. The construction method comprises the steps of dividing an original unbalanced credit data set into a training set and a verification set; wherein a plurality of data samples in the original unbalanced credit data set comprise credit information of a plurality of users, and the pluralityof data samples are in one-to-one correspondence with the plurality of users; dividing the data samples in the training set into a majority class of training samples and a minority class of trainingsamples; clustering the majority class training samples by using an unsupervised clustering algorithm to generate a plurality of sample clusters; obtaining a preset number of balance training subsetsaccording to the plurality of sample clusters and the minority class training samples; and constructing a credit scoring model according to the obtained balance training subset, the verification set and a preset decision tree base classifier. According to the method, the classification performance of the credit scoring model can be improved.
Owner:HUNAN UNIV

Deviation classification and parameter optimization method based on least square support vector machine technology

ActiveCN103324939AReduce search timeReduced probability of misclassification as acceptable productCharacter and pattern recognitionCouplingMinority class
The invention provides a deviation classification and parameter optimization method based on the least square support vector machine technology. A least square support vector machine is used for serving as a classifier, and is good in popularization capacity, and applicable to occasions with the requirement for high real-time performance, a virtual minority class oversampling algorithm is improved, influence of an isolating sample is eliminated, importance of a boundary sample is highlighted, classification can have a certain deviation, and the probability of wrongly classifying defective products into accepted products is reduced. According to the parameter optimization method based on the least square support vector machine technology, firstly, primary parameter optimization is carried out through a coupling simulated annealing algorithm, then sophisticated search is carried out by using a grid algorithm on the basis, the time for parameter optimization is reduced when a least square support vector machine model is trained, accuracy of classification is higher, and classification performance is improved.
Owner:JIANGNAN UNIV +1

Multi-target evolutionary fuzzy rule classification method based on decomposition

The invention discloses a multi-target evolutionary fuzzy rule classification method based on decomposition, which mainly solves the problem of poor classification effect of an existing classification method on unbalanced data. The multi-target evolutionary fuzzy rule classification method comprises the steps of: obtaining a training data set and a test data set; normalizing and dividing the training data set into a majority class and a minority class; initializing an ignoring probability, a fuzzy partition number and a membership degree function; initializing an original group, and determining weight by adopting a fuzzy rule weight formula with a weighting factor; determining stopping criteria for iteration, iteration times, a step size and an ideal point; dividing direction vectors according to groups; performing evolutionary operation on the original group, and updating the original group by adopting a Chebyshev update mode until the criteria for iteration is stopped; obtaining classification results of the test data set; then projecting to obtain AUCH and output. The multi-target evolutionary fuzzy rule classification method has the advantages of high operating speed and good classification effect and can be applied in the technical fields of tumor detection, error detection, credit card fraud detection, spam messages recognition and the like.
Owner:XIDIAN UNIV

Unbalanced data set-oriented extreme learning machine based transformer fault diagnosis method

The invention discloses an unbalanced data set-oriented extreme learning machine based transformer fault diagnosis method. The method specifically comprises the following steps: step 1, dividing a collected unbalanced sample set S={(x1, t1), (x2, t2)...(xn, tn)} with class labels of an oil-immersed transformer into training samples and test samples by a ratio of 6:1, wherein xi represents a sampleproperty, i may be equal to 1, 2, 3, 4, 5, 6, and specifically comprises six attributes of hydrogen, methane, ethane, ethylene, acetylene, carbon monoxide, ti represents a class label, i may be equalto 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6 respectively corresponding to normal state, middle temperature overheat, high temperature overheat, partial discharge, spark discharge, arc discharge, and the tiis clustered by a PAM algorithm; step 2, for minority classes, taking the cluster center of the PAM algorithm as a central point; and step 3, during the classification output stage of the extreme learning machine, firstly establishing a DAG-ELM model, secondly dividing a new data set generated in step 2 into training sets and test sets by the ratio of 6:1, wherein 6 parts are used for training modeling, and 1 part is used for verifying the classification effect. According to the unbalanced data set-oriented extreme learning machine based transformer fault diagnosis method, the influence of the unbalanced data set on the transformer fault diagnosis result is solved.
Owner:XI'AN POLYTECHNIC UNIVERSITY

Method for keeping balance of implementation class data through local mean

The invention discloses a method for keeping balance of implementation class data through a local mean, which comprises the following steps: (1) distinguishing a minority class through acquiring training data; and calculating the number of majority class data and minority class data, and calculating an integer of the ratio of the number of the majority class data to the number of the minority class data; (2) calculating k neighbors in the minority class for each data in the minority class, and generating new data through weighing the k neighbors; (3) repeatedly generating new data for each data through adjusting parameters in weight and utilizing weighted summation of the k neighbors of each data; (4) marking the new data as the minority class, and merging the new data and original data to obtain balanced two classes data; and (5) further processing the balanced two classes data, i.e. a training sorting algorithm, and realizing sorting of the new unmarked data. According to the invention, the accuracy of medical diagnosis can be improved, the recognition rate of network attack is improved, the recognition rate of server failure is improved, the recognition of garbage pages is improved, and the like.
Owner:SHANDONG NORMAL UNIV

A multi-classification method based on adaptive balanced integration and dynamic hierarchical decision-making

InactiveCN109359704AReduce dependenceSolve the problem of unbalanced number of positive and negative samplesCharacter and pattern recognitionMajority classData set
The embodiment of the invention provides a multi-classification method based on adaptive balance integration and dynamic hierarchical decision, which includes converting the original data set into a plurality of second-class data sets according to one-to-many decomposition strategy, taking the number of the majority class samples and the minority class samples in each second-class data set as theupper and lower limits of the parameter interval respectively, taking the average accuracy rate of each class as the scoring standard, and obtaining the sampling number of each subset by grid searching method; Based on this, the over-sampling and under-sampling techniques are combined to balance the two kinds of data sets to establish a plurality of binary classification sub-models, and the binaryclassification model is obtained by integrating the sub-models through the averaging method. According to the output results of all the binary classification models, the spatial position informationof the test samples is obtained under the one-to-many framework, and the classification strategies for the blank area, the intersecting area and the normal area are established to determine the finalcategory of the test samples. The technical proposal provided by the embodiment of the invention can improve the overall recognition rate of the classification model for each category under the one-to-many framework.
Owner:BEIJING UNIV OF POSTS & TELECOMM

Network intrusion detection model SGM-CNN based on class imbalance processing

For the data class imbalance problem, the present invention provides an effective network intrusion detection model SGM-CNN based on a Synthetic Minority Over-Sampling Technique (SMOTE) and a GaussianMixture Model (GMM) based on a data flow. According to the technical scheme, the method comprises the steps of firstly obtaining a to-be-identified network data flow; and preprocessing the data stream, inputting the preprocessed data stream into a pre-established network intrusion detection model based on a one-dimensional convolutional neural network (1D CNN), and outputting a detection result of the network data stream. The invention provides a class imbalance processing technology, namely an SGM, for large-scale data. The SGM firstly uses SMOTE to perform oversampling on minority class samples, then uses GMM to perform clustering-based downsampling on majority class samples, and finally balances data of each class. According to the SGM method, expensive time and space cost caused by oversampling is avoided, the situation that important samples are lost due to random downsampling is avoided, and the detection rate of minority classes is remarkably increased.
Owner:ZHENGZHOU UNIV

Credit card fraud detection method and system based on undersampling, medium and equipment

The invention provides a credit card fraud detection method and system based on undersampling, a medium and equipment. The method comprises: fitting majority class samples of a training set in a dataset by using a Gaussian mixture model; predicting probability density values of minority class samples in the training set by using the fitted Gaussian mixture model, and selecting a maximum value inthe probability density values as a cross edge of the two classes of samples; taking the cross edge as a center, extending upwards and downwards from the cross edge to set a sampling upper bound and asampling lower bound so as to carry out undersampling to obtain an undersampling data set, and combining the undersampling data set with the minority class sample set to form an equalization trainingset; training a machine learning classifier according to the balanced training set; and detecting a credit card transaction data set by using the trained machine learning classifier. The Gaussian mixture model is used for grabbing the samples with the two types of samples distributed at the crossed edges, more useful information is provided for recognition of the two types of samples, and the recognition accuracy of the classifier in the field of credit card fraud detection is improved.
Owner:TONGJI UNIV

Software defect prediction method based on class imbalance learning algorithm

The invention relates to a software defect prediction method based on a class imbalance learning algorithm. According to the method, a minority class sample is synthesized by using an SWIM oversampling method, so that a data set is converted into moderate imbalance from high imbalance, then minority class misclassification cost most suitable for a current data set is calculated by using a proposedadaptive cost matrix adjustment strategy, and then K weak classifiers are trained according to a training set, so that the classification accuracy of the data set is improved. In the process, the weight of the sample is continuously adjusted, the weight of the wrongly predicted sample is increased, the weight of the correctly predicted sample is reduced, and finally, the K weak classifiers are combined into a composite classifier to predict the category of the to-be-tested sample. According to the method, the problem of low prediction accuracy of minority class samples when the unbalanced data set is predicted is solved, defective modules can be accurately predicted, a test manager is helped to search for defects of software, and the software development cost is reduced.
Owner:HANGZHOU DIANZI UNIV

Method for classifying unbalanced data sets

The invention discloses a method for classifying unbalanced data sets. The method is applied to the fields of network intrusion detection, animal age prediction, vehicle performance evaluation and thelike. For the problem that the classification precision of minority classes is low in the prior art is solved, according to the invention, on the basis of original training data, the relation betweenminority classes and majority classes in an original data set is utilized, a SMOTE and K nearest neighbor algorithm is utilized to process an original training data set to construct a new set, and the set focuses on the minority classes and the majority class samples related to the minority classes; according to the method, two random forests with the same size are constructed according to original training data and the new set, decision trees in the two forests are combined into a large forest, a test set is tested together to obtain a classification result, and the obtained classification precision is greatly improved compared with the prior art.
Owner:UNIV OF ELECTRONICS SCI & TECH OF CHINA

Oversampling method and device based on SMOTE algorithm and electronic equipment

The invention provides an oversampling method and device based on an SMOTE algorithm and electronic equipment. The method comprises the following steps: acquiring a historical sample data set, and determining positive and negative samples and corresponding numbers thereof; determining majority class sample data and minority class sample data, and performing data vectorization processing; screeningtarget sample data from the minority class sample data set by using a departure point monitoring method; performing oversampling on the target sample data based on an SMOTE algorithm to generate a specific number of new sample data; and obtaining an amplified minority class sample data set according to the generated new sample data and the original minority class sample data. According to the method, while the sampling method is optimized, the problem of data imbalance is solved, the accuracy of model prediction is improved, and the deviation caused by data imbalance is effectively reduced.
Owner:北京淇瑀信息科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products