Patents
Literature
Patsnap Copilot is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Patsnap Copilot

562 results about "Unbalanced data" patented technology

Unbalanced data. In this context, unbalanced data refers to classification problems where we have unequal instances for different classes. Having unbalanced data is actually very common in general, but it is especially prevalent when working with disease data where we usually have more healthy control samples than disease cases.

High potential user buying intention prediction method based on big data user behavior analysis

The invention provides a high potential user buying intention prediction method based on big data user behavior analysis. The high potential user buying intention prediction method comprises the following steps: 101 data preprocessing: the historical behavior data set of the e-commerce user is preprocessed; 102 sample defining and marking: samples are constructed with the interacted user product pairs to act as the keywords according to the historical consumption behavior of the user; 103 division of a training set and a test set: the historical data are divided into the training set and the test set by using a time window division method; 104 feature construction: feature engineering construction of the historical behavior data of the user is performed; and 105 algorithm design and implementation: feature selection of the feature group and unbalanced data processing of the data set are performed and then the final result of two-layer model iterative learning algorithm prediction is put forward. The prediction model is established on the basis of the historical behavior data of the e-commerce user of the time span of 45 days so that whether the user places an order of the commodityin the candidate commodity set P in the following 5 days can be predicted.
Owner:上海普瑾特信息技术服务股份有限公司

Software failure positioning method based on machine learning algorithm

The invention discloses a software failure positioning method based on machine learning algorithm to solve the technical problem of low positioning efficiency of existing software failure positioning methods. According to the technical scheme, the method comprises the steps of describing failure distribution possibly existing in an actual program based on Gaussian mixture distribution to enable failure distribution in the program to be more definite; removing redundant test samples with a cluster analysis method based on a Gaussian mixture model, and finding a special test set for a specific failure, so that the adverse effect of redundant use cases on positioning precision is reduced; remodifying a support vector machine model to be adapted to an unbalanced data sample, and finding the nonlinear mapping relation between use case coverage information and an execution result by means of the parallel debugging theory, so that machine learning algorithm is free from the local optimal solution problem caused by uneven samples; finally, designing a virtual test suite, placing the virtual test suite in a well trained model for prediction, obtaining a statement equivocation value ranking result, and conducting failure positioning. In this way, software failure positioning efficiency is improved.
Owner:北京京航计算通讯研究所

Abnormal intrusion detection ensemble learning method and apparatus based on Wiener process

The invention relates to an abnormal intrusion detection ensemble learning method based on the Wiener process. The method comprises the following steps: selecting a network traffic data set; inputting each network traffic sample and sample probability distribution thereof to an uninitialized neural network classifier or a neural network weak classifier obtained through the previous training, judging whether the neural network weak classifier wrongly classifies each network traffic sample, and adjusting quantity and sample probability distribution of each network traffic sample; repeating the step 2 to obtain a plurality of neural network weak classifiers; determining the weight of each neural network weak classifier respectively; obtaining strong classifiers based on each weak classifier and the corresponding weight of each neural network weak classifier; inputting network data flow to be detected to the strong classifiers to obtain intrusion detection results; and repeating the step 6 until all the network data flow to be detected is detected. According to the method and apparatus in the invention, the problem of classification of the unbalanced data set can be solved, and an unbiased classifier with high classification correct rate can be obtained.
Owner:INST OF INFORMATION ENG CAS

Unbalanced data classification method based on unbalanced classification indexes and integrated learning

The invention discloses an unbalanced data classification method based on unbalanced classification indexes and integrated learning, and mainly solves the problem of low classification accuracy of the minority class of the unbalanced data in the prior art. The method comprises steps as follows: (1), a training set and a testing set are selected; (2), training sample weight is initialized; (3), part of training samples is selected according to the training sample weight for training a weak classifier, and the well trained weak classifier is used for classifying all training samples; (4), the classification error rate of the weak classifier on the training set is calculated, is compared with a set threshold value and is optimized; (5), voting weight of the weak classifier is calculated according to the error rate, and the training sample weight is updated; (6), whether the training of the weak classifier reaches the maximum number of iterations is judged, if the training of the weak classifier reaches the maximum number of iterations, a strong classifier is calculated according to the weak classifier and the voting weight of the weak classifier, and otherwise, the operation returns to the step (3). The classification accuracy of the minority class is improved, and the method can be applied to classification of the unbalanced data.
Owner:XIDIAN UNIV

Deep transfer learning-based unbalanced classification ensemble method

The invention discloses a deep transfer learning-based unbalanced classification ensemble method. The method comprises the following steps that: an auxiliary data set is established; an auxiliary deep network model and a target deep network model are constructed; the auxiliary deep network is trained; the structure and parameters of the auxiliary deep network are transferred to the target deep network; and the products of auprc values are calculated and are adopted as the weights of classifiers, and weighted ensemble is performed on the classification results of each transfer classifier, so that an ensemble classification result is obtained and is adopted as the output of an ensemble classifier. According to the method of the present invention, an improved average precision variance loss function (APE) and an average precision cross-entropy loss function (APCE) are adopted; when the loss cost of samples is calculated, the weights of the samples are dynamically adjusted; and few weights are assigned to majority classification samples, more weights are assigned to minority classification samples, and therefore, the trained deep network attaches more importance to the minority classification samples, and the method is more suitable for the classification of unbalanced data.
Owner:SOUTH CHINA UNIV OF TECH

Text classification method, text classifier and storage medium for unbalanced data set

The invention discloses a text classification method, a text classifier and storage medium for an unbalanced data set, wherein the method comprises the following steps of: acquiring the data set usedfor training a classification model; determining whether each text data is a plurality of samples or a few samples according to the category information marked by the text data; calculating the ratiobetween the number of the plurality of samples and the number of the few samples to obtain an unbalance ratio; carrying out pre-processing on the text data to obtain a corresponding sample point to map into a vector space; updating the data set after the interpolation sample is obtained based on the preset interpolation strategy, the unbalance rate and each sample point; training the classification model using the updated data set as a training sample set; acquiring the text data to be tested, and introducing the text data to be tested into the classification model after finishing the trainingto classify so as to obtain the category of the text data to be tested as a classification result. According to the invention, the few samples and a boundary region thereof can be enlarged, and the classification effect of the model can be effectively improved.
Owner:WEBANK (CHINA)

Unbalanced data sampling method in improved C4.5 decision tree algorithm

The invention relates to an unbalanced data sampling method in an improved C4.5 decision tree algorithm. The method comprises the steps as follows: firstly, initial weights of various samples are determined according to the number of various samples; the weights of the samples are modified through the training result of the improved C4.5 decision tree algorithm in each round; the information gain ratio and misclassified sample weights are taken into account by a division standard of the improved C4.5 algorithm; the final weights of the samples are obtained after T iterations; the samples in minority class boundary regions and majority class center regions are found out according to the sample weights; over-sampling is carried out on the samples in the minority class boundary regions by an SMOTE algorithm; and under-sampling is carried out on majority class samples by a weight sampling method, so that the samples in the center regions are relatively easily selected to improve the balance degree of different classes of data, and the recognition rates of the minority class and the overall data set are improved. According to the unbalanced data sampling method in the improved C4.5 decision tree algorithm, weight modification is carried out through the improved C4.5 decision tree algorithm; and over-sampling and under-sampling are specifically carried out according to the sample weights, so that the phenomena of classifier over-fitting, loss of useful information of the majority class and the like are effectively avoided.
Owner:CHONGQING UNIV OF POSTS & TELECOMM

Unbalanced data classification method

The invention relates to an unbalanced data classification method. The unbalanced data classification method comprises the following steps: in a labeled data set L, firstly processing each data point, calculating the distances between the data points and all non-similar data points, and reserving the shortest distance as the characteristic of the data points; arranging all the data points according to the characteristic from small to large, taking the first t data points with the minimum characteristic to form an initial training set T, and enabling the remaining data points to form an initial non-training set N; using a support vector machine, and utilizing an active learning strategy to carry out iterative learning on the training set T; after training begins, a temporary classification hyperplane P is generated in each step of iteration, using the Ps to carry out trial classification on all the data points in the N, if mispredicted data points exist, drawing an item at random from the mispredicted data points to be added to the training set T, and meanwhile selecting the data point closest to the P in the N to be added to the T; if no mispredicted data points exist in the N, selecting the data point closest to the P from the data points to be added to the training train T. carrying out subsequent training.
Owner:NANJING UNIV

Data classification method based on intuitive fuzzy integration and system

The invention relates to the field of pattern recognition, and discloses an unbalanced data classification method based on intuitive fuzzy integration and a system based on the method. The method comprises the following steps of: a) cleaning original data, and classifying original point-of-sale (POS) class samples according to intra-class positions to generate POS class artificial samples; b) training a base classifier by using different sample sets of inter-class approximate balance; c) converting the classification output equal utility of the base classifier into an intuitive fuzzy matrix; and d) integrating samples to be classified into the membership and the non-membership of the POS class and the negative (NEG) class by combining the weight of the base classifier, and making a classification decision. The invention has the advantages that: over learning is avoided by integrating over sampling and under sampling; the training samples of the base classifier are different, so that the difference of the base classifier is ensured; the base classifier is not specifically limited, so the method has good expandability; the intuitive fuzzy reasoning method quantitatively describes the uncertainty in classification so as to improve the performance of integrated learning; therefore, the system based on the method can better support the medical diagnosis decision and the like.
Owner:NANJING NORMAL UNIVERSITY

Selective up-sampling combined method for weighted ensemble classification prediction of unbalanced data flows

The invention relates to the technical field of data mining, and discloses a selective up-sampling combined method for weighted ensemble classification prediction of unbalanced data flows. The method comprises the following steps of: screening minority class samples of history data blocks according to a similarity, and selecting samples closest to the current training data block in the aspect of concept; synthesizing the selected samples into new samples in a decision boundary area so as to selectively implement up-sampling; and carrying out weighted ensemble classification on the new sample by adoption of a probability distribution relevancy-based weight distribution strategy. According to the method, the minority class sample information is effectively increased through selecting history data with high similarities and synthesizing new data at the boundary area, so that the decision domain of the minority class is enlarged; and meanwhile, in order to adapt the dynamic data with concept drift and use an ensemble classification thought, the probability distribution relevancy-based weight distribution strategy is designed, so that the overall classification precision is enhanced. Experiment results show that the method is capable of effectively improving the minority class identification rate and the overall classification performance, and has the advantage of better processing the unbalanced data flows.
Owner:NORTHEASTERN UNIV

Weight clustering and under-sampling-based unbalanced data classification method

InactiveCN106778853ATo achieve the effect of automatic clusteringImprove classification accuracyCharacter and pattern recognitionData dredgingMajority class
The classification of unbalanced data sets already becomes one of most challenging problems in data mining. A quantity of minority class samples is far smaller than a quantity of majority class samples, so that the minority classes have the defects of low accuracy, poor generalization performance and the like in a classification learning process of a conventional algorithm. The algorithm integration already becomes an important method for dealing with the problem, wherein random under-sampling-based and clustering-based integrated algorithms can effectively improve classification performance. But, the former easily causes information loss, and the latter is complex in calculation and difficult to popularize. The invention provides a weight clustering-based improved integrated classification algorithm fusing under-sampling, which is specifically a weight clustering and under-sampling-based unbalanced data classification method. According to the algorithm, a cluster is divided according to weights of the samples, a certain proportion of majority classes and all minority classes are extracted from each cluster according to weight values of the samples to form a balanced data set, and classifiers are integrated by utilizing an Adaboost algorithm framework, so that the classification effect is improved. An experimental result shows that the algorithm has the characteristics of accuracy, simplicity and high stability.
Owner:CENT SOUTH UNIV

Prediction method for unbalanced data set based on isolated forest learning

The invention discloses a prediction method for an unbalanced data set based on isolated forest learning. The prediction method comprises the following steps: receiving a prediction request; collecting data, and defining features and labels in the data set and the number of minority class samples and majority class samples; converting a non-numerical feature column and a label column in the data set into classification numerical values; synthesizing minority class samples by using a majority class weighted minority class oversampling technology to form a balance data set; performing abnormal point identification and removal on the balance data set by using an isolated forest algorithm; then performing data standardization, and dividing a training set and a test set; constructing and training a support vector machine classifier model by using the training set; adjusting hyper-parameters of the support vector machine classifier model through a genetic algorithm, and obtaining a prediction model after training is completed; and inputting the test set into the prediction model to obtain a prediction result. The prediction method for the unbalanced data set based on isolated forest learning has the characteristics of stable prediction result and high prediction precision.
Owner:XIAN UNIV OF TECH

High-dimensional imbalanced data classification method based on SVM

The invention proposes a high-dimensional imbalanced data classification method based on SVM. The method includes two parts. The first part is feature selection. An SVM-BRFE algorithm is used to carryout boundary resampling to find the optimal feature weight to carry out feature importance measuring, feature selecting and training set updating, and the process is repeated. Finally, a feature mostconductive to enhancing the F1 value is retained, and other features are removed. A subsequent training process is carried out under the condition with feature redundancy and irrelevant feature combination as less as possible and dimension as low as possible. The influence of a high-dimensional problem on an imbalance problem and the constraint of an SMOTE oversampling algorithm are reduced. Thesecond part is data sampling. An improved SMOTE algorithm, namely PBKS algorithm, is used. Few classes in boundaries automatically partitioned by SVM are used as distance constraints in the Hilbert space Dxij<H>, and original constraints are replaced. A grid method is used to find the approximate preimage. The method provided by the invention can finish the classification task of high-dimensionalunbalanced data stably and effectively, and can obtain a considerable effect.
Owner:HARBIN INST OF TECH SHENZHEN GRADUATE SCHOOL
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products