Patents
Literature
Hiro is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Hiro

219 results about "Data imbalance" patented technology

Imbalance means that the number of data points available for different the classes is different: If there are two classes, then balanced data would mean 50% points for each of the class. For most machine learning techniques, little imbalance is not a problem.

Estimating conversion rate in display advertising from past performance data

Embodiments of the invention present an approach to conversion rate estimation which relies on using past performance observations along user, publisher, and advertiser data hierarchies. More specifically, embodiments of the invention model the conversion event at different select hierarchical levels with separate binomial distributions and estimate the distribution parameters individually. It is shown how to combine these individual estimators using logistic regression to identify conversion events accurately. Embodiments of the invention can also handle many practical issues, such as data imbalance, missing data, and output probability calibration, which render this estimation problem more difficult for a real-world implementation of the approach.
Owner:AMOBEE

Short-impending rainfall prediction method based on ConvLSTM and 3D-CNN

ActiveCN110363327AEvenly distributedIncreased accuracy of rainstorm forecastsForecastingNeural architecturesData imbalanceShort-term memory
The invention discloses a short-impending rainfall prediction method based on ConvLSTM and 3D-CNN, and belongs to the technical field of weather forecast. The method comprises the following steps: firstly, inputting a historical radar echo map, a gridding temperature and total rainfall at a moment t, and performing data cleaning and denoising on the historical radar echo map and the gridding temperature; performing statistical analysis on the rainfall data imbalance problem, and establishing a new loss function using different weights at different rainfall rate levels; secondly, standardizingthe gridding temperature and the total rainfall by using a meteorological data mapping method based on power and logarithm transformation; and finally, fusing the t-moment input data subjected to theprevious step into a data block, carrying out model building and testing based on a convolutional long-term and short-term memory neural network and a three-dimensional convolutional neural network, and outputting a short-term and temporary rainfall prediction result. According to the method, the rainstorm prediction precision can be improved, the meteorological data is reasonably visualized and standardized, the image features of various meteorological data are fused, and the noise interference is reduced.
Owner:SOUTHEAST UNIV

Mechanical part fault diagnosis method based on deep learning under data imbalance

The invention discloses a mechanical part fault diagnosis method based on deep learning under data imbalance. The mechanical part fault diagnosis method comprises the steps: firstly obtaining an original vibration signal from a sensor, and obtaining frequency domain data through fast Fourier transform; then, inputting the frequency domain data into a generative adversarial network based on a Wasserstein distance; after multiple rounds of adversarial training of the generator and the discriminator, when the WGAN reaches Nash equilibrium, generating a large amount of fault sample data from the generator, and then mixing the generated fault sample data into original fault sample data to balance a data set; and finally, converting the balanced sample data into two-dimensional data, and inputting the two-dimensional data into a global average pooling convolutional neural network for feature extraction and fault classification to realize fault diagnosis of the mechanical parts. According tothe invention, the WGAN is used to reasonably solve the problem of data imbalance, and the GAPCNN is used to carry out fault classification diagnosis, so that the diagnosis precision is improved.
Owner:CENT SOUTH UNIV

Ensemble-of-under-sampled extreme learning machine

The invention relates to an ensemble-of-under-sampled extreme learning machine which is characterized in that for a training sample with class data imbalance, performing random under-sampling on a majority sample (FP data) in the training sample at first and then segmenting the majority sample into N majority subsamples according to a ratio N of the majority sample to a minority sample; combining the N majority subsamples with the minority sample respectively to form N training subsets; training N extreme learning machines by the obtained N training subsets to obtain N classifiers; feeding test samples to the N classifiers respectively, wherein each classifier obtains a classification result; setting a decision threshold value D, combining the classification results, and comparing a combined classification result with the decision threshold value D to decide a final classification result, wherein all the classifiers are same in voting weight. The ensemble-of-under-sampled extreme learning machine is relatively high in classification efficiency and simple in parameter adjustment method.
Owner:TIANJIN UNIV

Cross-project defect prediction method based on data screening and data oversampling

The invention discloses a cross-project defect prediction method based on data screening and data oversampling. Reasonable data screening and data imbalance processing strategies are designed, and cross-project historical software module data truly similar to the project module data is screened by means of a hierarchical clustering algorithm, so that a cross-project software defect prediction model is protected from the influence of irrelevant cross-project historical software module data; then by means of an oversampling method, defective software module data is added, and a new dataset with relatively balanced classification is obtained, so that the cross-project software defect prediction model is protected from the influence of an imbalanced training dataset. According to the technical scheme, the method has the advantages of being simple and efficient, and the performance of the cross-project software defect prediction model can be well improved.
Owner:WUHAN UNIV

Construction method for classifier

The invention relates to a construction method for a classifier. The construction method includes the following steps that a part of majority class training samples in a training sample set are removed through an undersampling method, and a current training sample set is updated through the undersampled training sample set, wherein the training sample set comprises the majority class training samples and minority class training samples, and the classes of all the training samples in the training sample set are known; oversampling is conducted on the minority class training samples in the training sample set, and the classifier is constructed through the oversampled training sample set. According to the construction method for the classifier, noise in the training samples is removed effectively, the problem of data imbalance can be solved effectively, the accuracy rate of training sample data classification is greatly increased, the calculation amount is small, and the method is simple.
Owner:HARBIN INST OF TECH

Greedy Active Learning for Reducing Labeled Data Imbalances

A method, system and computer-usable medium are disclosed for reducing labeled data imbalances when training an active learning system. The ratio of instances having positive labels or negative labels in a collection of labeled instances associated with an input category used for learning is determined. A first instance for annotation is selected from a collection of unlabeled instances if a first threshold for negative instances, and a first threshold confidence level of being a positive instance of the input category, have been met. A second instance for annotation is selected if a second threshold for positive instances, and a second threshold confidence level of being a negative instance of the input category, have been met. The first and second instances are respectively annotated with a positive and negative label and added to the collection of labeled instances, which are then used for training.
Owner:IBM CORP

Bullet screen text classification method, device, equipment, and storage medium

PendingCN110399490AImprove performanceSolve the problem caused by the uneven distribution of proportional dataCharacter and pattern recognitionSelective content distributionData imbalanceData set
The invention provides a bullet screen text classification method, a bullet screen text classification device, equipment and a storage medium. The method comprises the steps: obtaining an imbalance training data set with a pre-marked category, and dividing the training data set into a sufficient sample and an insufficient sample; training the sufficient samples by adopting a textCNN model; carrying out model training on the insufficient samples by adopting an SVM classifier; inputting a text to be tested into the trained textCNN model, and outputting classification probabilities of various categories in sufficient samples; and if the output classification probability is smaller than a first preset threshold, inputting the to-be-tested text into a trained SVM classifier, and outputting a predicted category. According to the method, the classification models for different text scales are obtained through separate training according to the sizes of the training samples, then the two classification models are combined to be used for classifying the to-be-detected text, the problem of data imbalance of the training samples is solved, compared with single model training, the risk of over-fitting can be reduced, bias is reduced, and the recognition accuracy is higher.
Owner:WUHAN DOUYU NETWORK TECH CO LTD

Few-sample cross-modal hash retrieval common representation learning method

PendingCN111753189ACapture dependenciesImprove cross-modal retrieval accuracyMultimedia data clustering/classificationStill image data clustering/classificationData imbalanceFeature extraction
The invention provides a few-sample cross-modal hash retrieval common representation learning method. According to the method, a oneself knowing-adversary knowing network is designed, mainly relates to two modules: a oneself knowing module and an adversary knowing module. The oneself knowing module can fully utilize the information hidden in the data itself, fuse the features of different levels,and extract more global features; on the basis of the oneself knowing modules, the adversary knowing module carries out modeling on the correlation of all the samples, and the nonlinear dependence relationship between the data is captured, so that the common representation of different modal data can be better learned. And finally, a loss function for maintaining intra-modal and inter-modal similarity is established, and training optimization is carried out on the network. According to the method, the problem of data imbalance under the condition of few samples can be effectively solved, and more representative common representation can be learned, so that the cross-modal retrieval precision is greatly improved.
Owner:SUN YAT SEN UNIV

Federated learning training method based on model dispersion

The invention discloses a federated learning training method based on model dispersion. The invention relates to the field of artificial intelligence in edge calculation. According to the invention, in a real environment, data are often non-uniform and are distributed in a non-independent same manner, and unbalanced distribution of the data enables model updating uploaded to a central server by each client to have different degrees of difference, so that a high-quality model is difficult to train by randomly selecting the clients to participate in training. Meanwhile, the unbalanced distribution of the data can also amplify the influence caused by over-fitting, and model divergence is caused when the influence is serious. According to the method, in order to train a high-quality model under the condition of data imbalance, an updating strategy of a dynamic loss function is adopted to improve the stability of the model, and a client is selected according to the importance of the model,so that the accuracy and convergence rate of the model are improved. Meanwhile, on the basis of the two, a large number of traversal times and a proper regularization parameter mu are selected, so that the performance of the model is optimal.
Owner:NANJING UNIV OF POSTS & TELECOMM

Biological information recognition method based on dynamic sample selection integration

The invention discloses a biological information recognition method based on dynamic sample selection integration, mainly solving the problem of low correct recognition rate of subclass samples caused by data imbalance. The realizing process for solving the problem comprises the following steps: (1) a training set is divided into a series of balanced sub data sets by adopting a training set dividing method; (2) the obtained balanced sub data sets are divided into respective matrix classifiers as initial training sets; (3) on the matrix classifiers, cyclic training is carried out by adopting a dynamic sample selecting method; (4) a testing set is tested by decision functions obtained in each training, thus obtaining decision results; (5) weight of the decision results is calculated by adopting a cost-sensitive idea; and (6) the decision results of each time are weighted and integrated, thus obtaining the final recognition result. Compared with the prior art, the method has the advantages of high accuracy and low calculation complexity, the size relation between a correct ratio and a recall ratio can be regulated as required, and the method is used for recognizing biological information, network intrusion and financial fraud and detecting anti-spam.
Owner:XIDIAN UNIV

A transformer state evaluation clustering analysis method based on data imbalance measurement

ActiveCN109816031AImprove status assessment accuracyClustering effect is goodCharacter and pattern recognitionData imbalanceTransformer
The invention discloses a transformer state evaluation clustering analysis method based on data imbalance measurement. The method comprises the following steps of screening out index parameters corresponding to different types of fault analysis of the power transformer from unbalanced monitoring data according to a common fault index system of the power transformer, and processing the index parameters by using a proportional normalization method; randomly selecting two groups of data in the index parameters as an initial clustering center, and setting clustering analysis parameters; calculating the Euclidean distance between each type of fault index parameters and the initial clustering center, dividing data in the unbalanced monitoring data into a lower approximate set or a boundary areaof a class cluster according to the Euclidean distance, and calculating the degree of unbalance between the class clusters; measuring the membership degree of the monitoring data by fusing the class cluster imbalance degree; carrying out iterative computation on the class cluster center according to the class cluster data distribution condition; and finally, carrying out state evaluation on the power transformer according to a clustering result. According to the present invention, the state evaluation precision of the power transformer is effectively improved.
Owner:NANJING UNIV OF POSTS & TELECOMM

Breast tumor classification algorithm based on convolutional neural network VGG16

The invention relates to a breast tumor classification algorithm based on a convolutional neural network VGG16. The algorithm comprises the following steps that: data preprocessing: for a dataset which presents a data imbalance state, carrying out imbalance processing and data enhancement processing; the establishment of the convolutional neural network: 1) network pre-training: utilizing the VGG16 to carry out network training on an ImageNet large natural image dataset, and storing trained weight; 2) network key node selection: utilizing different layers of the VGG16 network to carry out feature extraction on a breast tumor DDSM (Digital Database for Screening Mammography) dataset, applying the same SVM (Support Vector Machine) classifier for classification for extracted features, and selecting a layer with highest classification performance as a node constructed by a new network; and 3) connecting two layers of full connection and one layer of softmax to form a new network behind thenode constructed by the selected network; and carrying out migration learning.
Owner:TIANJIN UNIV

K neighbor-based Bayesian personalized recommendation method and device

The invention discloses a K neighbor-based Bayesian personalized recommendation method. The K neighbor-based Bayesian personalized recommendation method comprises the following steps: 1) through behavior data of a user, seeking K neighbors of the user; 2) according to observed positive feedback items of the user and observed positive feedback items of a user group consisting of k neighbor users of the user, dividing an item set; 3) determining an item level preference relation of the user; 4) maximizing the probabilities of all the users on the item set to obtain an objective function, wherein item prediction of the user is realized by adopting a matrix decomposition model; parameters in the objective function are solved by adopting a stochastic gradient descent method. The invention further discloses a K neighbor-based Bayesian personalized recommendation device. Through the K neighbor-based Bayesian personalized recommendation method and the K neighbor-based Bayesian personalized recommendation device, mutual impact between the users is taken into account, and through the impact, the item set is divided, so that the number of unobserved items is reduced, and an adverse impact caused by data imbalance and data sparseness in the recommendation process is effectively relieved.
Owner:PEKING UNIV +1

Network transaction fraud detection system based on twin neural network

The invention discloses a network transaction fraud detection system based on a twin neural network. The input data of the network transaction fraud detection system is composed of a group of data pairs. The network transaction fraud detection system is composed of two neural network models with the same structure, and the two neural network models achieve the twinning purpose by sharing weights.The network transaction fraud method based on the twin neural network has a very good experimental effect; according to the method, for the problems of time sequence sparsity and data imbalance in network transactions, a twinning structure is used for processing unbalanced data, and an LSTM structure is used for enabling a network to have a memory function, so that the detection capability of thenetwork on fraud transactions is improved.
Owner:DONGHUA UNIV

Active incremental training method for deep learning multi-class medical image classification

The invention discloses an active incremental training method for deep learning multi-class medical image classification. The method comprises the following steps: 1, performing preliminary data cleaning and preprocessing on a medical image data set; 2, randomly selecting initial data, and carrying out initial training on the network model; 3, testing the rest samples in the data set to obtain thecorrespondence between the prediction score and the lesion category; 4, performing cross expansion on residual samples in the data set, and actively screening candidate samples; 5, performing furtherdata set cleaning; 6, performing incremental training on the model; and 7, testing the model after incremental training, if the accuracy is stable, ending the training, and otherwise, repeating the steps 4 to 7. According to the method, an AIFT method is improved, and the problems of difficult medical image classification, low training efficiency and the like caused by data imbalance are solved.The problem that the application effect of deep learning in the field of lesion classification is poor is solved, and the auxiliary effect on disease diagnosis of doctors is improved.
Owner:UNIV OF ELECTRONICS SCI & TECH OF CHINA +1

Enterprise tax fraud detection method, electronic device and storage medium

The invention discloses an enterprise tax fraud detection method, an electronic device and a storage medium. The method comprises the following steps: acquiring a plurality of invoice training data, wherein the invoice training data includes tax abnormal enterprise invoice data and tax normal enterprise invoice data; performing feature processing on the invoice training data; establishing tax fraud model according to the invoice training data after feature processing; acquiring the data to be detected of the invoice, and carrying out feature processing on the data to be detected of the invoice; the invoice data to be detected after feature processing is calculated and detected by tax fraud model in order to obtain the tax fraud results of enterprises. The invention realizes the detection of the tax enterprise according to the invoice data of the enterprise, avoids the shortcoming that the traditional detection method needs all the business data of the enterprise, and simultaneously solves the data imbalance existing in the invoice data detection of the reason and the difficult problem of high cross fusion between the abnormal enterprise and the normal enterprise.
Owner:ZHONGKAI UNIV OF AGRI & ENG

Data equalization method based on deep learning multi-weight loss function

The invention relates to a data equalization method based on a deep learning multi-weight loss function, and the method comprises the steps: firstly obtaining a target image data set in a training process employing a deep learning model, determining the class number C of data samples and the size Ni of each class of samples according to the target data set, determining hyper-parameters [alpha] and [gamma] and a weighting coefficient Ci of the importance of each class of samples, and determining a multi-weight loss function MWLfocal (z, y), carrying out continuous iterative training by using the neural network model, carrying out error calculation by using the multi-weight loss function in the training process, and continuously updating weight parameters of the model by using a back propagation algorithm until network convergence reaches an expected target, thereby finally completing training. By means of the loss function, the problems of sample number imbalance and classification difficulty imbalance of different data classes can be solved at the same time, the detection accuracy of key classes can be further improved, the method can be applied to a data set with the data imbalance problem, and therefore the influence of the class imbalance problem is effectively relieved.
Owner:UNIV OF SCI & TECH OF CHINA

Heterogeneous federated learning mine electromagnetic radiation trend tracking method based on SVD algorithm

The invention discloses a heterogeneous federated learning mine electromagnetic radiation intensity trend tracking method based on an SVD algorithm, and the method comprises: firstly proposing a heterogeneous model federated learning algorithm for the problem of data imbalance in a federated learning client, and setting a heterogeneous central model in a server for the client to select, so as to improve the precision of a local model; aiming at the problem of uploading communication cost of local model parameters, providing an efficient communication algorithm that an SVD algorithm is firstlyused for decomposing a parameter matrix to obtain a corresponding singular value matrix, and then the singular value matrix is uploaded to a central server for aggregation updating; and finally, usingthe updated local model by each client to extract local data features, and using the features and real data values by each client to train the ESN and then to execute trend tracking. According to theinvention, trend tracking of electromagnetic radiation intensity acquired by multiple sensors can be realized on the premise of protecting data privacy, the trend tracking precision of each client can be improved, and the communication cost required by a framework is reduced.
Owner:CHINA UNIV OF MINING & TECH

Real-time video face key point detection method based on deep learning

The invention relates to a real-time video face key point detection method based on deep learning, and the method employs a convolutional neural network to carry out the key point detection of a single frame, employs a depth separable convolution to improve the model detection rate, employs a boundary heat map as an additional subtask of an original network to improve the constraint of a global face structure of the original network. The method improves the detection accuracy of an original network, is used for solving a data imbalance loss function of a heat map, improves the generalization capability of a model for a large attitude sample under a limited sample, and improves the inter-frame smoothness through an optical flow loss function. In the detection process, for a frame of which the confidence is lower than a key point confidence threshold due to an extremely large angle, fitting is carried out by utilizing 3DMM to obtain dense key point coordinates, 68-point sampling is carried out on the obtained dense key points according to a projection error between minimum frames, and the consistency with the previous frame is kept. The method has the advantages of real-time performance, capability of utilizing global inter-frame information, high detection accuracy of a face large posture condition and the like.
Owner:HEBEI UNIV OF TECH

Software defect prediction method based on data imbalance

The invention discloses a software defect prediction method based on data imbalance, which comprises the following steps: taking various error reports with software metric values as an original data set for prediction from projects with known bug distribution; performing imbalance processing on the text matrix in the original data set by adopting an RSMOTE imbalance processing strategy to obtain abalanced data set; modeling the balance data set by using naive Bayes, polynomial naive Bayes, K neighbors, a support vector machine, a classification tree and Adaboost to find a classifier with an optimal prediction effect; and extracting a software metric value of a new project at an unknown bug position, inputting the software metric value into the classifier for prediction, outputting prediction information about whether each program segment has a bug or not, and recording and storing the prediction information. According to the method, the RSMOTE imbalance processing strategy is adoptedto perform imbalance processing on the text matrix in the original data set, so that the generation of a few types of samples is more flexible, and more extensive and reasonable samples can be generated.
Owner:DALIAN MARITIME UNIVERSITY

Method of solving data imbalance based on Epochs

The invention discloses a method of solving data imbalance based on Epochs and belongs to the field of deep learning. In training process, each Epoch randomly resamples each class according to weightso that samples in each Epoch can be averagely represented during training; addition is made to each sample according to the resampled weights, a sample set of single Epoch size is randomly resampledfrom a sample base according to a weight ratio so that data of the resampled Epochs are relatively balanced. The method has the advantages that the data imbalance problem can be more effectively solved; in the training process, each Epoch randomly resamples each class according to the weight so that the samples in each Epoch can be averagely represented during training; the main idea lies in thataddition is made to each sample according to the resampled weights, the sample set of single Epoch size is randomly resampled from the sample base according to the weight ratio, and accordingly, dataof the resampled Epochs are relatively balanced.
Owner:BEIJING UNIV OF TECH

Clock recovery and adaptive equalizer combined device and method in coherent light communication

The invention discloses a clock recovery and adaptive equalizer combination device and method in coherent light communication, relates to the technical field of coherent light communication, and adopts a new equalizer structure and algorithm to realize clock recovery and adaptive equalization at the same time. Specifically, a two-stage real number equalization structure and a second-stage adaptivecoefficient are adopted to calculate a timing error so as to realize clock recovery, polarization demultiplexing and dispersion compensation. According to the structure, an independent clock recoverymodule is omitted, compared with a traditional structure, the calculation complexity is reduced, the clock recovery accuracy is improved, and the tolerance to IQ data imbalance of a receiving end isimproved. The two-stage equalization of the structure adopts real number equalization, so that the IQ two paths of data are not combined together any more in the equalization process, and the problemthat the system performance is greatly influenced under the condition that IQ has time delay can be effectively solved.
Owner:WUHAN POST & TELECOMM RES INST CO LTD

Bearing fault intelligent diagnosis method for digital twin system

The invention discloses a bearing fault intelligent diagnosis method for a digital twin system, and belongs to the technical field of bearing fault unbalance detection. According to the method, the bearing fault diagnosis effect is improved under the condition that normal data and abnormal data in a digital twin system are unbalanced under the actual condition and original data are not expanded. The method comprises the steps of monitoring a bearing vibration signal of a target bearing in real time by a digital twin system; inputting the current bearing vibration signal of the target bearing into the bearing fault diagnosis network, and obtaining the current fault detection result of the target bearing based on the output of the bearing fault diagnosis network. The bearing fault diagnosis method is used for diagnosing the data imbalance phenomenon in the actual scene of the bearing, the diagnosis effect on the fault data can be improved in the actual scene with normal and fault data imbalance, and the health condition of the bearing equipment can be monitored in real time by utilizing the bearing fault diagnosis network set by the bearing fault diagnosis method in the digital twin system.
Owner:UNIV OF ELECTRONICS SCI & TECH OF CHINA

Integrated learning-based software defect reopening prediction method

The invention discloses an integrated learning-based software defect reopening prediction method. The method comprises the following steps of: S1, extracting an LWEparagraph2vec-based semantic vector features from a defect report of software; S2, combining the LWEparagraph2vec-based semantic vector features extracted from the defect report of the software with meta features thereof to form a feature set; S3, constructing a prediction model according to an imbalanced data processing-based integrated learning prediction algorithm UnderSMOTEBagging method; and S4, obtaining a class label of a living example according to the feature set extracted in the step S2 and the prediction model obtained in the step S2, so as to judge whether detects of the software is going to be reopened or not. The method disclosed by the invention is capable of solving the problem that the prediction effect is not ideal due to data imbalance in software defect reopening prediction and finiteness of the used feature set.
Owner:XI AN JIAOTONG UNIV

Information classification method based on deep learning and related equipment

The invention relates to the technical field of data analysis, in particular to an information classification method based on deep learning and related equipment, and the method comprises the steps: obtaining the data quantity of to-be-recognized information, determining a clustering mode of the to-be-recognized information according to the data quantity, and carrying out the preprocessing of theto-be-recognized information in a clustering mode, and obtaining pre-classified data; performing word vector conversion on the pre-classified data to obtain word vectors of the pre-classified data; the word vectors of the pre-classified data are added into a deep learning model for text feature extraction, and a plurality of text features are obtained; classifying the text features to obtain a classification result of the text features; and scoring the classification result by applying a voting mechanism, and determining a classification label of the to-be-identified information according to the scoring result. According to the method, the problem that the content of original information cannot be accurately reflected when text feature extraction is carried out by applying a deep learningmodel due to data imbalance is effectively solved.
Owner:CHINA PING AN LIFE INSURANCE CO LTD

Automatic identification method for shale microsections

The invention discloses an automatic identification method for shale microsections. The method comprises the steps that (1) at the first stage, rock slices are divided into igneous rock, sedimentary rock and metamorphic rock; (2) at the second stage, shale is identified from the sedimentary rock obtained according to classification at the first stage. The classification identification technologies adopted at the two stages are both the decision tree technology, and extracted characteristics all belong to statistical characteristics of RGB channels and fractal characteristics of gray channels of images of the rock slices. According to the method, the shale microsections are automatically identified through the information processing technology at the two stages, so that the problem of non-ideal classification results caused by data imbalance is solved. With respect to characteristic selection, the good fractal characteristics of the shale are fully utilized, and the method is applicable to automatic identification of the shale. The automatic identification method is simple and efficient in calculation and has expansibility, and the accuracy of the identification method can be improved along with an increase in data storage of the rock slices; the method has application value in geological prospecting and mineral research.
Owner:NANJING UNIV

Oversampling method and device based on SMOTE algorithm and electronic equipment

The invention provides an oversampling method and device based on an SMOTE algorithm and electronic equipment. The method comprises the following steps: acquiring a historical sample data set, and determining positive and negative samples and corresponding numbers thereof; determining majority class sample data and minority class sample data, and performing data vectorization processing; screeningtarget sample data from the minority class sample data set by using a departure point monitoring method; performing oversampling on the target sample data based on an SMOTE algorithm to generate a specific number of new sample data; and obtaining an amplified minority class sample data set according to the generated new sample data and the original minority class sample data. According to the method, while the sampling method is optimized, the problem of data imbalance is solved, the accuracy of model prediction is improved, and the deviation caused by data imbalance is effectively reduced.
Owner:北京淇瑀信息科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products