Data resampling method based on clustering oversampling and instance hardness threshold
A technique of oversampling and clustering, applied to instruments, character and pattern recognition, computer components, etc., can solve problems such as prediction deviation
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment Construction
[0021] The present invention will be further described below in conjunction with the examples, and the present invention includes but not limited to the following examples.
[0022] The present invention provides a data resampling method based on clustering oversampling and instance hardness threshold, and its basic implementation process is as follows:
[0023] 1. Clustering processing
[0024] As a commonly used clustering algorithm, K-means divides data by iterative solution. The present invention first adopts the K-means algorithm to cluster the text data set. First, randomly select k pieces of text in the text data set as the initial clustering center, k is the number of clusters to be obtained, and different k values will affect the results of subsequent clustering filtering and sampling weight distribution. The selection of k in the present invention Values are 2, 5, 10 or 15. Then repeat the following process: assign each text to the cluster center with the clos...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com