Patents
Literature
Hiro is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Hiro

169 results about "Class imbalance" patented technology

Class imbalance is the fact that the classes are not represented equally in a classification problem, which is quite common in practice. For instance, fraud detection, prediction of rare adverse drug reactions and prediction gene families (e.g. Kinase, GPCR). Failure to account for the class imbalance often causes...

Class-imbalance problem classification method based on expansion training data set

The invention discloses a class-imbalance problem classification method based on expansion training data set; the method comprises the following steps: obtaining a true data set needed by a classification task; screening a few class samples from the true data set, and distinguishing samples that are close to and far away from the decision boundary; inputting said samples, running a productive confrontation network, thus obtaining artificial samples similar to the true data; adding certain amount of artificial samples into the true data set, thus obtaining a mixed data set; inputting the mixeddata set, and using a classifier to classify. The method combines a CycleGAN model with the boundary information of an original data set, thus effectively simulating distribution features of the truedata. The method samples small sample data so as to improve the classifier precision, and effectively preventing the class-imbalance problem from affecting the classification task.
Owner:SOUTH CHINA UNIV OF TECH

Software defect prediction method based on genetic algorithm and random forest

InactiveCN109977028AGood dimensionality reduction effectObjective and accurate classification prediction resultsArtificial lifeSoftware testing/debuggingData setAlgorithm
The invention discloses a software defect prediction method based on a genetic algorithm and a random forest. The software defect prediction method comprises the following steps of carrying out data preprocessing on each subset of a software defect data set; performing feature selection based on the genetic algorithm and a random forest algorithm; constructing a random forest classifier; and carrying out the software defect prediction, training the random forest classifier by using the processed software defect data set, obtaining the random forest classifier with a better classification effect through a multi-test experiment, and then inputting the processed software defect test set into the trained classifier to finally obtain a classification result of the test set. The method is well suitable for the software defect data sets with difference and class imbalance; the genetic algorithm and the random forest algorithm are combined for feature selection, so that a very good dimension reduction effect is achieved. By using an integrated algorithm based on a decision tree to independently learn and make predictions, and combining the prediction results, a final prediction result is obtained.
Owner:YANSHAN UNIV

Unbalanced text classification method and system combining SVM and semi-supervised clustering

The invention discloses an unbalanced text classification method and system combining SVM and semi-supervised clustering. The unbalanced text classification method comprises the steps: carrying out preprocessing on a to-be-processed text, and obtaining text data in a vector format, and enabling the text data to serve as a data set; using the training set to train the SVM classifier to obtain a classification model, and using the classification model to predict the test set to obtain the category and confidence of the test set; clustering the data set by using a semi-supervised clustering algorithm to obtain the category to which the test set belongs and the confidence coefficient of the test set; and fusing the category to which the test set obtained by the SVM classifier and the semi-supervised clustering algorithm belongs and the confidence coefficient of the test set to obtain final output. According to the unbalanced text classification method, different types of methods in the technical field of unbalanced text classification are combined; advantage complementation of the different methods is achieved; vectorization and normalization methods are used; and the defect that whenhigh-dimensional sparse text data are processed, a text classification result is inaccurate due to the fact that labeled texts are too few is overcome. The unbalanced text classification method effectively solves the problem of text class imbalance.
Owner:JIANGSU UNIV

Multiclass unbalanced genomics data iterative integrated feature selection method and system

The present invention discloses a multiclass unbalanced genomics data iterative integrated feature selection method and system. Aiming at the characteristic of unbalanced data distribution of multi-labeled genomics data, the present invention provides the iterative feature selection method. On the basis of integrating classifiers in a one-to-many manner, undersampling or oversampling and feature selection are iteratively operated, so that samples of a data set gradually reach a balanced state along with gradual decrease of the number of features. By adopting a classifier obtained after integration in the process, classification identification capability on subclass samples can be obviously improved. A weak classifier based on sub balanced data training is integrated into a strong classifier by adopting an integrated learning technology, so that classification accuracy can be obviously improved.
Owner:SHENZHEN UNIV

Classification method of short text

The invention relates to a classification method of short text. The classification method is characterized in that a hyperplane cuts two classes of samples, then the geometric distance between each multiclass sample and the hyperplane is calculated, multiple subdomains are divided according to the geometric distance, each subdomain is endowed with unique weight, the weight of the subdomains decreases gradually along with the increasing of the distance from the subdomains to the hyperplane, sub-sampling is performed on data according to the weight in a sub-sampling stage, and obtained sampled samples are imported into an SVM algorithm to perform classification. By the classification method, the problems of high-dimension sparsity and class imbalance in text classification can be solved effectively.
Owner:TONGJI UNIV

Software defect prediction method based on two-stage wrapping-type feature selection

The invention discloses a software defect prediction method based on two-stage wrapping-type feature selection, and belongs to the field of software quality assurance. The software defect prediction method comprises the following steps: (1) mining the version control system and the defect tracking system of a software project, extracting a program module from the version control system and the defect tracking system, and carrying out type marking and software measurement on the program module to generate a defect prediction data set D; (2) carrying out two-stage wrapping-type feature selection on the defect prediction data set so as to remove redundant features and irrelevant features in the data set D as many as possible, and finally, selecting an optimal feature subset FS' from an original feature set FS; and (3) on the basis of the optimal feature subset FS', preprocessing the data set D, forming a preprocessed data set D', and finally, constructing a defect prediction model in virtue of a decision tree which is a classification method. By use of the software defect prediction method, on one hand, the redundant features and the irrelevant features in the defect prediction data set can be effectively identified and removed, on the other hand, a class imbalance problem in the defect prediction data set can be effectively alleviated, and finally, the performance of the defect prediction model can be effectively improved.
Owner:南京瑞沃软件有限公司

Protein-DNA binding residue prediction method based on sampling and integrated learning

The invention discloses a protein-DNA binding residue prediction method based on sampling and integrated learning. The method comprises the steps of (1) feature extraction and training sample set construction, (2) sampling and model training, (3) model integration, and (4) online prediction. The method is used for solving the shortcomings of low prediction precision caused by the problems of few feature types and class imbalance in protein-DNA binding residue prediction problems and has the advantages of high prediction precision and high generalization ability.
Owner:NANJING UNIV OF SCI & TECH

Multi-feature software defect comprehensive prediction method based on unbalanced noise set

ActiveCN111782512ASolve the problem that the measurement is not comprehensive enoughReduce complexityCharacter and pattern recognitionSoftware testing/debuggingData setNetwork structure
The invention discloses a multi-feature software defect comprehensive prediction method based on an unbalanced noise set. The multi-feature software defect comprehensive prediction method comprises the following steps: constructing an initial data set containing code features, development process features and network structure features; performing preliminary undersampling processing on the data set, and reducing repeated data in most classes; searching a k nearest neighbor sample set for metric elements in the data set through a tendency score matching method; realizing noise reduction processing of the data set through k nearest neighbor sample threshold judgment; performing sample synthesis on the minority class in the data set and the minority class in the k nearest neighbor sample setto eliminate the class imbalance problem of the data set; and adaptively constructing a plurality of machine learning models and selecting the most suitable machine learning model to perform defect prediction on the new version of software. According to the method, the problem of class imbalance generally existing in software defect prediction is solved. And noise samples are removed based on noise discrimination processing of tendency score matching.
Owner:北京高质系统科技有限公司

MRI brain tumor automatic segmentation method of double-flow decoding convolutional neural network based on edge feature optimization

ActiveCN111709952AAddressing severe imbalancesFocus on learningImage enhancementImage analysisEdge extractionFeature fusion
The invention discloses an MRI brain tumor automatic segmentation method based on a double-flow decoding convolutional neural network of edge feature optimization. According to the invention, two optimization strategies based on edges are mainly used to improve the performance of brain tumor segmentation. Firstly, on a network structure, an independent decoding network branch is designed to process edge stream information, and the edge stream information is fused into semantic stream information through feature fusion. Secondly, punishing unmatched pixels of the prediction segmentation mask and the label near the edge by using a regularization loss function to encourage the prediction segmentation mask to be aligned with the label value around the edge; in training, a new edge extraction algorithm is introduced to provide edge tags with higher quality. In addition, an adaptive balance class weight coefficient is added into a cross entropy loss function, so that the problem of serious class imbalance in back propagation of edge extraction is solved. Experiments show that the tumor segmentation precision is effectively improved.
Owner:WUXI TAIHU UNIV +1

Data equalization method based on deep learning multi-weight loss function

The invention relates to a data equalization method based on a deep learning multi-weight loss function, and the method comprises the steps: firstly obtaining a target image data set in a training process employing a deep learning model, determining the class number C of data samples and the size Ni of each class of samples according to the target data set, determining hyper-parameters [alpha] and [gamma] and a weighting coefficient Ci of the importance of each class of samples, and determining a multi-weight loss function MWLfocal (z, y), carrying out continuous iterative training by using the neural network model, carrying out error calculation by using the multi-weight loss function in the training process, and continuously updating weight parameters of the model by using a back propagation algorithm until network convergence reaches an expected target, thereby finally completing training. By means of the loss function, the problems of sample number imbalance and classification difficulty imbalance of different data classes can be solved at the same time, the detection accuracy of key classes can be further improved, the method can be applied to a data set with the data imbalance problem, and therefore the influence of the class imbalance problem is effectively relieved.
Owner:UNIV OF SCI & TECH OF CHINA

Binary classification method for processing non-small cell lung cancer data with missing values and imbalance

The invention relates to a binary classification method for processing non-small cell lung cancer data with missing values and imbalance, and belongs to the technical field of data classification. Themethod comprises the following steps: preprocessing data, filling missing values with medians, eliminating abnormal values by using a Tukey's method, and normalizing the data by using deviation standardization; secondly, carrying out data balance by adopting an SMOTEENN comprehensive sampling method combining oversampling and undersampling; finally, the balanced data set is used for training a random forest classifier, the classification effect is tested on the test set, and therefore the non-small cell lung cancer survival prediction dichotomy method effectively aiming at the problems of missing values and class imbalance is achieved. Experiments performed on a non-small cell lung cancer data set prove the effectiveness and superiority of the method, the classification precision of the non-small cell lung cancer data with missing values and imbalance is improved, and more accurate medical decisions can be achieved.
Owner:KUNMING UNIV OF SCI & TECH

Unbalanced-like network traffic classification method and device and computer equipment

The invention relates to the technical field of network traffic classification, and relates to an unbalanced-like network traffic classification method and device and computer equipment. The method comprises the steps of obtaining to-be-classified network traffic data, and extracting features of network traffic; deleting irrelevant features and redundant features by adopting a feature selection algorithm, and performing dimension reduction on the remaining features so as to select an optimal feature subset; and inputting the optimal feature subset into a weight-based multi-classifier, performing network traffic classification training in an incremental learning mode, optimizing classifier performance, and classifying the network traffic. Aiming at the problem of unbalanced distribution ofnetwork traffic samples, irrelevant features and redundant features are deleted, and the recognition rate of small categories is effectively improved on the premise of ensuring the overall classification accuracy; an incremental learning thought is introduced, the flexibility of model updating training is improved, and the model updating period is shortened; and by utilizing multiple classifiers based on weight, the influence caused by concept drift is reduced.
Owner:CHONGQING UNIV OF POSTS & TELECOMM

Pedestrian re-identification method based on distance centralization and projection vector learning

The invention relates to the technical field of pedestrian re-identification in computer vision, in particular to a pedestrian re-identification method based on distance centralization and projectionvector learning. The method includes the following steps: Step 1, dividing a pedestrian training set and test set; Step 2, extracting features of a pedestrian image, including color features and texture features; Step 3, calculating the feature distance of centralization; Step 4, building a pedestrian re-identification model based on iterative projection vector learning; Step 5, using a conjugategradient method to solve the model iteratively; and Step 6, calculating different pedestrian feature distances in the test set to perform pedestrian re-identification, the overfitting situation brought by class imbalance is effectively solved and the identification accuracy of pedestrian re-identification is improved. The pedestrian re-identification method based on distance centralization and projection vector learning can well improve the training speed and has a good suppression effect on the noise. The method has very good robustness to pedestrian posture, illumination variation and shielding.
Owner:CHANGZHOU UNIV

Evaluation method for performance influence degree of classification models by class imbalance

The invention relates to an evaluation method for performance influence degree of classification models by class imbalance. The evaluation method comprises the following steps of (1) building a classification model base; (2) constructing a new data set; (3) forecasting the new data set by the classification models; (4) evaluating the performance of the classification models; and (5) evaluating an influence degree level. According to the evaluation method, firstly, a typical classification algorithm in machine learning is adopted to build the classification model base; secondly, a class imbalance data set is selected as a reference data set, a group of new data sets with imbalance ratio gradually increased is built on the basis, different classification models are selected to respectively classify and forecast the group of new data sets; and finally, a variable coefficient is adopted to evaluate the performance variation degree of the classification models and also carry out level division, thus, the influence degree of the class imbalance on the performance of different classification models is evaluated, and a guidance significance is played in research on the class imbalance process. With regards to different classification models, the evaluation method for performance influence degree of the classification models by class imbalance, provided by the invention, has high universality.
Owner:CHINA UNIV OF MINING & TECH

Network intrusion detection model SGM-CNN based on class imbalance processing

For the data class imbalance problem, the present invention provides an effective network intrusion detection model SGM-CNN based on a Synthetic Minority Over-Sampling Technique (SMOTE) and a GaussianMixture Model (GMM) based on a data flow. According to the technical scheme, the method comprises the steps of firstly obtaining a to-be-identified network data flow; and preprocessing the data stream, inputting the preprocessed data stream into a pre-established network intrusion detection model based on a one-dimensional convolutional neural network (1D CNN), and outputting a detection result of the network data stream. The invention provides a class imbalance processing technology, namely an SGM, for large-scale data. The SGM firstly uses SMOTE to perform oversampling on minority class samples, then uses GMM to perform clustering-based downsampling on majority class samples, and finally balances data of each class. According to the SGM method, expensive time and space cost caused by oversampling is avoided, the situation that important samples are lost due to random downsampling is avoided, and the detection rate of minority classes is remarkably increased.
Owner:ZHENGZHOU UNIV

Advertisement click rate prediction framework and algorithm based on user behaviours

InactiveCN108830416APredict interest drift in real timePredicting the probability of interest drift in real timeForecastingMarketingFeature extractionHigh dimensional
The invention discloses an advertisement click rate prediction framework and algorithm based on user behaviours. ID characteristics and other characteristics are co-converted in different levels intomeaningful numerical characteristics; due to the characteristics, the characteristic sparsity and redundancy can be reduced; the characteristic expressiveness can be improved; simultaneously, to further improve the characteristic expressiveness, characteristic selection and characteristic combination are carried out by utilization of a GBDT model in the invention; high-dimensional characteristicsare processed by utilization of an LR model; finally, to solve a class imbalance problem, a down-sampling algorithm based on a K_Means model is provided in the invention; in an experimental process, characteristic extraction on original characteristics is carried out at first; then, characteristic classification is carried out by adoption of heuristic thinking; characteristic combination is carried out by inputting perceptual characteristics into the GBDT model; finally, rational characteristics and combination characteristics are input into the LR model with a certain weight, so that advertisement click rate prediction is carried out; and an experimental result shows that the algorithm in the invention is improved both on RMSE and R2 indexes.
Owner:SICHUAN UNIV

Data resampling method based on repeated editing nearest neighbor and clustering oversampling

InactiveCN110942153ASolve the class imbalance problemImprove classification effectMachine learningDistance matrixAlgorithm
The invention relates to a data resampling method based on repeated editing nearest neighbor and clustering oversampling. The method comprises the steps: calculating the Euclidean distance between each to-be-sampled book and a nearby sample, selecting the sample with the smallest distance as the nearby sample of the to-be-sampled book, comparing whether the labels of the sample and the nearby sample are the same or not, and deleting the sample if the labels of the sample and the nearby sample are different; dividing the remaining samples into k clusters by using K-means, and filtering out theclusters of which the ratio of the number of majority class samples to the number of minority class samples is less than an imbalance rate threshold c; calculating an Euclidean distance between minority class samples in each cluster, constructing a distance matrix of the cluster, summing all off-diagonal elements in the matrix, and dividing the sum by the number of the off-diagonal elements to obtain an average distance of the cluster; calculating a sparse factor of each cluster; and calculating a resampling weight value of each cluster, and determining the number of generated new samples according to the weight values by using an SMOTE method. According to the method, the problem of class imbalance in the data is solved, so that the classifier can obtain a better classification effect.
Owner:NORTHWESTERN POLYTECHNICAL UNIV

Classification of class-imbalanced data

InactiveCN104933053AFull information miningIncrease diversitySpecial data processing applicationsImbalanced dataTest set
The present invention relates to the data mining technology, and especially relates to a method for training a class-imbalanced data classifier, a class-imbalanced data classifier and a method for classifying the class-imbalanced data. According to one embodiment of the method for training the class-imbalanced data classifier, data classified by the class-imbalanced data classifier has a plurality of properties. The method comprises the following steps that the properties are divided into a plurality of property groups, each property group corresponds to one sub-classifier, and each sub-classifier is suitable for classifying the data based on the corresponding property group, so as to obtain an ultimate classification result by the classification results of the sub-classifiers according to pre-set rules; training data samples are divided into multiple test sets; and for each property group, the corresponding sub-classifiers are trained by using different test sets.
Owner:CHINA UNIONPAY

Software defect prediction method based on class imbalance learning algorithm

The invention relates to a software defect prediction method based on a class imbalance learning algorithm. According to the method, a minority class sample is synthesized by using an SWIM oversampling method, so that a data set is converted into moderate imbalance from high imbalance, then minority class misclassification cost most suitable for a current data set is calculated by using a proposedadaptive cost matrix adjustment strategy, and then K weak classifiers are trained according to a training set, so that the classification accuracy of the data set is improved. In the process, the weight of the sample is continuously adjusted, the weight of the wrongly predicted sample is increased, the weight of the correctly predicted sample is reduced, and finally, the K weak classifiers are combined into a composite classifier to predict the category of the to-be-tested sample. According to the method, the problem of low prediction accuracy of minority class samples when the unbalanced data set is predicted is solved, defective modules can be accurately predicted, a test manager is helped to search for defects of software, and the software development cost is reduced.
Owner:HANGZHOU DIANZI UNIV

Flight delay early warning method based on evolutionary sub-sampling integrated learning

The invention discloses a flight delay early warning method based on evolutionary sub-sampling integrated learning and belongs to the technical field of airport flight delay early warning. The method specifically comprises the following steps of: firstly, carrying out discretization processing on target attributes of flight delay measured data sets, removing noise points, and obtaining standardized data sets; then, using an evolutionary sub-sampling method to carry out T times of sub-sampling on most classes of the data sets of class imbalance, and constructing T balanced training sets; using a grid searching technology to carry out parameter optimization of a classification regression decision tree classifier on each balanced training set, and generating classifiers; and finally, determining an optimal integration mode to form an integrated system EUS-Bag by the classifiers, which is namely a flight delay early warning model. The early warning model is capable of providing a decision making basis for reasonable air traffic scheduling to an air management department. The method is high in intelligent degree, and the accuracy and reliability of flight delay early warning are effectively improved.
Owner:NANJING UNIV OF AERONAUTICS & ASTRONAUTICS

Forest fire danger grade determination method and system based on one-class SVM

The invention provides a forest fire danger grade determination method and a forest fire danger grade determination system based on a one-class SVM. The forest fire danger grade determination method comprises the steps of: regarding a day as a sample unit, and selecting samples suffering fire disasters as modeling samples according to fire data; acquiring meteorological factors corresponding to the modeling samples; constructing a one-class SVM model based on the meteorological factors corresponding to the modeling samples; constructing a forest fire danger occurrence probability model, namely, mapping a value-taking interval of distances from the samples output by the one-class SVM model intermediately to a sphere center of a hypersphere in the one-class SVM model to [0, 1], and taking a mapping result as a forest fire danger occurrence probability; and calculating a forest fire danger occurrence probability of a sample to be detected, and determining a forest fire danger grade according to the forest fire danger occurrence probability of the sample to be detected. The forest fire danger grade determination method and the forest fire danger grade determination system based on the one-class SVM effectively overcome the problem of class imbalance due to concentration of the forest fire samples, and improve the accuracy degree of forest fire danger determination.
Owner:上海事凡物联网科技有限公司

Image enhancement method based on deep learning

The invention discloses an image enhancement method based on deep learning. The method comprises the following steps: collection and processing of an original image, augmentation processing of the original image, sharpening processing of a target image, construction of a training set, construction of a data set and removal of class imbalance in the data set, removal of redundancy in the training set by wavelet transform, and training of the data set and the training set by a convolutional neural network. According to the invention, redundant information in the original image can be effectivelyremoved through gray processing of the original image; the redundancy is removed through wavelet transform, so that the training precision and effectiveness of the original image serving as a training set in the later stage can be comprehensively and effectively improved; in addition, in the process, the image can be denoised in real time, the image with a good visual effect can be obtained, theclass imbalance phenomenon is removed by adopting an interpolation-based SMOTE method, the problem of sample class imbalance can be effectively solved, and the enhancement effect of the image is ensured to be stable.
Owner:JIANGXI UNIVERSITY OF FINANCE AND ECONOMICS

A support vector machine integrated learning method based on AdaBoost

The invention relates to a support vector machine integrated learning method based on AdaBoost. In order to overcome the defect that an existing support vector machine learning method is low in precision when processing a class imbalance classification problem, the invention provides a support vector machine integrated learning method based on AdaBoost, and a weighted support vector machine (W-SVM) is used. According to the method, sample distribution information can be deeply mined, prediction precision can be remarkably improved, and the method is an effective tool for solving the class imbalance problem.
Owner:CHINA UNIV OF PETROLEUM (EAST CHINA)

Job shop real-time scheduling method based on PCA-XGBoost-IRF

The invention discloses a job shop real-time scheduling method based on PCA-XGBoost-IRF. The method comprises the steps of 1, constructing a standard data sample; 2, pre-processing the sample data, performing abnormal value processing, class imbalance processing and normalization processing on the sample data, and segmenting a data set to meet the input requirements for decision model construction; 3, carrying out feature engineering processing on a training set, wherein the feature engineering processing comprises feature extraction, feature importance calculation and feature selection; 4, carrying out decision model construction based on an improved random forest, including random forest model construction, improvement of an RF model to obtain an IRF model, and optimization of hyper-parameters of the IRF model based on grid search; 5, performing PCA-XGBoost-IRF decision model training based on the optimal parameters; and 6, realizing the real-time selection and decision-making of a dynamic job shop scheduling rule by using a decision-making model based on PCA-XGBoost-IRF. According to the present invention, the real-time scheduling method which is more reliable and higher in robustness and generalization is provided for the intelligent scheduling research based on data driving.
Owner:XINJIANG UNIVERSITY

A public opinion tendency identification method for training sample category distribution imbalance

The invention discloses a public opinion tendency identification method for training sample category distribution imbalance. The method comprises the steps of firstly collecting the vocabularies related to the concerned public opinion field as public opinion hot words to create a lexicon; crawling a comment data set from a public opinion information source and divided into a training set and a test set; then classifying the public opinion tendency of the training set manually, and for the problem of class imbalance, adopting a bootstrap learning method for supplementing processing; extractingfeatures of each type of training samples, training an algorithm model by adopting naive Bayes, a support vector machine, a decision tree and other algorithms, classifying test set data by using the trained model, and identifying public opinion tendency according to a classification result. The methods of bootstrap learning, feature vector construction and classification model training all adopt atime-sensitive weighting method for weighting, so that the public opinion tendency reflected by the methods is more timely. The public opinion tendency identification method solves the problem of inaccurate classification caused by imbalance of training data, and improves the accuracy of public opinion tendency identification and the timeliness of public opinion analysis.
Owner:WUHAN UNIV

Mature miRNA full-site recognition method based on SVM-AdaBoost

InactiveCN109390037AReduce the number of offsetsImprove recognition accuracyProteomicsGenomicsData setFeature set
A mature miRNA full-site recognition method based on SVM-AdaBoost belongs to the field of bioinformatics. The existing single classifier has the problems of low accuracy and class imbalance in the recognition of mature miRNAs. The mature miRNA full-site recognition method based on SVM-AdaBoost comprises the steps of: selecting a pre-miRNA sequence in a miRBase database and building a training dataset and a test set on the selected sequence; extracting biological features of mature miRNA splicing sites based on the structured sequence; obtaining a new feature set by an information gain featureselection algorithm; constructing a probability-based adjustable parameter SVM classifier model; constructing an ensemble classifier model based on the AdaBoost algorithm; training a miRNA splicing full-site classifier. The method improves the recognition accuracy and reduces the average nucleotide offset number. By comparing and analyzing a plurality of mature miRNA recognition methods through the same test set, it is proved that the classification performance of the method provided by the invention is higher.
Owner:QIQIHAR UNIVERSITY

Multi-class unbalanced remote sensing land cover image classification method based on integrated intervals

The invention discloses a multi-class unbalanced remote sensing land cover image classification method based on integrated intervals, and mainly solves the problem of low classification precision of unbalanced images in the prior art. According to the implementation scheme, the method comprises the following steps: acquiring an unbalanced training sample, and pre-classifying the unbalanced training sample by using a random forest classification algorithm; counting voting numbers of the pre-classified unbalanced training samples, and establishing an integrated interval model based on voting; sorting the unbalanced training samples according to the number of the samples and the integrated interval value, reserving the minimum class, randomly selecting the samples from the rest classes at anundersampling rate, and constructing a new balanced training subset; and inputting each new balance training subset into the CART decision tree, and generating an ensemble learning model through a main voting principle to obtain a final classification result of the unbalanced remote sensing image. The method can effectively reduce the loss of useful information during classification through the integrated interval model, is high in anti-noise capability, is high in training speed, and can be used for land cover and environment monitoring.
Owner:XIDIAN UNIV

Product key part state classification method for class imbalance data

The invention discloses a product key part state classification method for class imbalance data. The method comprises the steps of: obtaining and preprocessing an auxiliary training set and a source training set; performing N times of undersampling processing on the majority class samples in the source training set to obtain N relatively balanced sub-data sets; training N SVM classifiers in parallel by using the N sub-data sets, and selecting by using a voting method to obtain a final prediction result; taking out a minority of auxiliary data in a final prediction result and adding the minority of auxiliary data to the source training set; constructing a deep learning classification model and supervising training by using the reconstructed source training set; and performing detection processing on to-be-predicted sensor data. According to the method, label data in the source training set and unlabeled data in the auxiliary data set are fully utilized, a weak supervised learning methodis utilized for processing, the imbalance proportion of class imbalance data can be reduced, and the prediction effect of the classification model is improved.
Owner:ZHEJIANG UNIV

Data resampling method based on clustering oversampling and instance hardness threshold

InactiveCN112115992AReduce the risk of fittingLess predictableCharacter and pattern recognitionData setMinority class
The invention provides a data resampling method based on clustering oversampling and an instance hardness threshold. The method comprises the following steps: firstly, performing clustering processingon a data set by utilizing a Kmeans method, and performing filtering processing and sampling weight distribution on clustering; then, adopting an SMOTE algorithm to carry out oversampling on the dataset to generate new data, so that the number of minority class samples in the data set is equal to that of majority class samples, and the data set becomes class balance; and finally, cleaning the data by adopting an instance hardness threshold algorithm to obtain a final balanced data set with fewer noisy points. According to the method, the class imbalance data set can be processed into the balance data set, and the prediction performance of the classifier for minority class samples is improved.
Owner:NORTHWESTERN POLYTECHNICAL UNIV

Few-sample learning classifier construction method based on unbalanced data

The invention relates to a few-sample learning classifier construction method based on unbalanced data, and belongs to the technical field of computer data classification. The method comprises the following steps: firstly, designing a twinning parallel full connection network for feature learning of input sample pairs according to primary learning and few sample learning characteristics of a twinning neural network; and then, processing the imbalance problem of the input sample pairs by using a cost sensitive optimizer, designing an expected error classification cost function according to different error classification costs, and integrating the expected error classification cost function into a network parameter optimization algorithm for adjusting class imbalance classification weights.According to the method, a better classification result can be obtained under unbalanced, high-dimensional and limited target data sets, and the classification performance is more stable.
Owner:CHONGQING UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products