Machine learning sample oversampling method based on swarm intelligence

A machine learning and oversampling technology, applied in the field of machine learning, can solve problems such as model overfitting, and achieve the effect of reducing negative effects and data bias

Pending Publication Date: 2022-03-25
SICHUAN CHANGHONG ELECTRIC CO LTD
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The technical problem to be solved by the present invention is: to propose a machine learning sample oversampling method based on group wisdom, to solve the problem that the oversampling scheme in the prior art easily causes the overfitting of the model by repeating positive samples

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Machine learning sample oversampling method based on swarm intelligence

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0027] Such as figure 1 As shown, the machine learning sample oversampling method based on group wisdom in this embodiment includes the following steps:

[0028] 1. Data collection:

[0029] Specifically, in this step, the big data platform is used to collect the user's daily behavior data based on the smart TV to obtain massive user behavior data. Taking the Hadoop big data platform as an example, the implementation process of data collection includes:

[0030] a. When the user initiates a voice video search, collect the click and viewing data of the user after the search.

[0031] b. When the user conducts a text search, collect click and movie viewing data after the user searches.

[0032] c. When the user browses the film and television recommendation page, collect the user's browsing, clicking, and viewing data.

[0033] d. Collect complete basic film and television data, including but not limited to title, classification, total duration, relevant cast and crew inform...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to the field of machine learning, discloses a machine learning sample oversampling method based on swarm intelligence, and solves the problem that an oversampling scheme in the prior art is easy to cause overfitting of a model by repeating positive samples. According to the scheme, firstly, a large amount of user behavior data is obtained through a big data platform; then, extracting a behavior positive and negative sample set from the user behavior data according to a required rule to form an original sample set; thirdly, grouping the users in the original sample set, and dividing the users into low-activity users and non-low-activity users; then, for low-activity users, generating a user supplement positive sample set by using a machine learning algorithm based on group wisdom; then, removing repeated / conflicted samples of the original sample set and the supplementary sample set; and finally sampling from the supplementary sample set and supplementing to the original sample set.

Description

technical field [0001] The invention relates to the field of machine learning, in particular to a machine learning sample oversampling method based on group wisdom. Background technique [0002] In the contemporary age when smart TVs are popular, using recommendation algorithms to provide personalized and accurate video recommendations to users has become a strong demand for users. When the recommendation algorithm is used for model training, the Imbalance Rate (IR) of the data set is a major factor affecting the model training results. An unbalanced data set may lead to poor model training or training failure. [0003] In actual business scenarios, almost all data sets are unbalanced data. For example, in the field of film and television recommendation, assuming that the behavior of users watching movies and TV is taken as a positive sample, and the behavior of users who browsed but not watched is taken as a negative sample, then the amount of negative sample data will be ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06N20/00G06F16/951G06F16/9535G06F16/958
CPCG06N20/00G06F16/951G06F16/9535G06F16/958
Inventor 刘婵吴上波
Owner SICHUAN CHANGHONG ELECTRIC CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products