Model training method and device for training data

A technology for training data and model training, applied in the computing field, can solve problems such as waste of storage space, unsatisfactory data randomness, poor model effect, etc., to achieve the effect of saving memory, ensuring randomness, and improving the effect of model training

Active Publication Date: 2015-11-11
SHENZHEN TENCENT COMP SYST CO LTD
View PDF6 Cites 23 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] In the training data generated for advertising click rate estimation, there are often a large number of repeated training data, and these large amounts of repeated data waste storage space to a large extent
If these repeated training data are aggregated and only one copy of the training data is kept, although the memory is saved, the same data will be accumulated in one place after aggregation, which will destroy the uniform distribution of the data and fail to satisfy the original randomness of the data. sex
However, the SGD algorithm needs to obtain better model training results on a training data set that guarantees randomness. Training on an aggregated training data set often results in poor model results.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Model training method and device for training data
  • Model training method and device for training data
  • Model training method and device for training data

Examples

Experimental program
Comparison scheme
Effect test

no. 1 example

[0036] In this embodiment, description will be made from the perspective of a model training device for training data, which may be referred to as a model training device, and the model training device may specifically be integrated in a network device such as a server or a gateway.

[0037] A model training method for training data, comprising: obtaining original training data; aggregating the original training data to obtain the aggregated training data; establishing an index vector according to the original training data and the aggregated training data, and the absolute value of the value of the index vector is used to indicate The position of the training data in the aggregated training data in the original training data; randomly read the value of the index vector, and obtain the corresponding training data from the aggregated training data according to the value; use the acquired training data to perform model training.

[0038] Please refer to FIG. 1 . FIG. 1 is a schem...

no. 2 example

[0065] According to the method described in the first embodiment, an example will be given below for further detailed description.

[0066] see Figure 2a , Figure 2a It is a schematic flow chart of the training data model training method provided by the second embodiment of the present invention, the method includes:

[0067] In step S201, original training data is acquired.

[0068] In step S202, aggregate the original training data to obtain aggregated training data.

[0069] Wherein, the steps S201 and S202 may specifically be:

[0070] For example, if there are M pieces of training data in the original training data, the repeated training data in the M pieces of training data are aggregated, and only one copy of the training data is kept, and these retained copies of the training data are used as new training data to form an aggregation Training data, and determine that there are N pieces of training data in the aggregated training data, where M and N are positive in...

no. 3 example

[0112] In order to facilitate better implementation of the training data model training method provided by the embodiment of the present invention, the embodiment of the present invention further provides a device for the model training method based on the above training data. The meanings of the nouns are the same as those in the above-mentioned method of training data model training, and for specific implementation details, please refer to the description in the method embodiments.

[0113] Please refer to FIG. 3. FIG. 3 is a schematic structural diagram of a model training device for training data provided by an embodiment of the present invention. The model training device for training data may include an acquisition unit 301, an aggregation unit 302, a vector establishment unit 303, and a reading unit 304 And training unit 305, as follows:

[0114] Wherein the obtaining unit 301 is used to obtain the original training data;

[0115] The aggregation unit 302 is configured...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a model training method and device for training data. The method comprises the following steps: obtaining original training data; carrying out aggregation on the original training data to obtain aggregated training data; according to the original training data and the aggregated training data, establishing an index vector, wherein the absolute value of an index vector value is used for indicating the position of the training data in the aggregated training data in the original training data; randomly reading the value of the index vector, and obtaining corresponding training data from the aggregated training data according to the value; and utilizing the obtained training data to carry out model training. On the premise of training data aggregation, the value of the index vector is randomly read, the corresponding training data can be obtained from the aggregated training data, and the randomness of the training data used for the model training is guaranteed so as to improve a model training effect on the basis of memory saving.

Description

technical field [0001] The invention belongs to the technical field of computing, and in particular relates to a model training method and device for training data. Background technique [0002] The click-through rate estimation of online advertisements plays an important role in the process of advertising. At present, the industry mainly uses simple linear models such as logistic regression (LR, Logistic Regression) to model the click-through rate of advertisements. The model solving process is simple and fast, and can be To a certain extent, it prevents overfitting of the data, etc. Although the LR model is simple to solve, in the era of big data, it is still necessary to fully exploit the computing performance of LR. Stochastic Gradient Descent (SGD, Stochastic GradientDescent) algorithm is an optimization algorithm commonly used to train LR models, which can achieve faster convergence in scenarios with massive data. [0003] In the training data generated for advertisi...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/24556
Inventor 李超
Owner SHENZHEN TENCENT COMP SYST CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products