Method and device for processing training data

A technology for training data and processing methods, applied in the field of computing, can solve the problems of reducing the speed of model network transmission, model training update speed, storage space consumption, etc., and achieve the effect of improving network transmission speed and reducing consumption.

Active Publication Date: 2018-04-20
SHENZHEN TENCENT COMP SYST CO LTD
View PDF3 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] Saving the LR model through the hash table can facilitate model training and prediction, but since the hash table is stored in a sparse structure, the storage space is relatively large, which reduces the network transmission speed and model training of the model. update speed

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for processing training data
  • Method and device for processing training data
  • Method and device for processing training data

Examples

Experimental program
Comparison scheme
Effect test

no. 1 example

[0034] In this embodiment, description will be made from the perspective of a training data processing device, and the processing device may specifically be integrated in a network device such as a server or a gateway.

[0035] A method for processing training data, comprising: obtaining training data, and determining an original feature space of the training data, the original feature space being the original feature data storage structure of the training data; performing scan statistics on the original feature space, and establishing Global index; map the training data to the actual feature space according to the global index, and the actual feature space is a storage structure formed according to the location where the feature data is actually stored in the original feature space; use the training data of the actual feature space for model training.

[0036] see Figure 1b , Figure 1b is a schematic flowchart of the training data processing method provided by the first em...

no. 2 example

[0065] According to the method described in the first embodiment, an example will be given below for further detailed description.

[0066] see Figure 2a , Figure 2a It is a schematic flowchart of the training data processing method provided by the second embodiment of the present invention. The methods include:

[0067] In step S201, training data is acquired.

[0068] In step S202, scanning statistics are performed on the original feature space to determine the position in the original feature space where feature data is actually stored.

[0069] Wherein, the steps S201 and S202 may specifically be:

[0070] For example, the training data can specifically be some historical data, which can be represented as a matrix, each row of which is a piece of historical data, including independent variable X (such as user characteristics and advertisement characteristics) and dependent variable y (such as whether the user clicks on the advertisement) and other characteristic dat...

no. 3 example

[0105] In order to better implement the training data processing method provided by the embodiment of the present invention, the embodiment of the present invention further provides a device based on the above training data processing method. The meanings of the nouns are the same as those in the above-mentioned training data processing method, and for specific implementation details, please refer to the description in the method embodiments.

[0106] see image 3 , image 3 It is a schematic structural diagram of a training data processing device provided in an embodiment of the present invention. The training data processing device includes an acquisition unit 301, an index establishment unit 302, a mapping unit 303, and a training unit 304, as follows:

[0107] Wherein, the acquiring unit 301 is configured to acquire training data, and determine an original feature space of the training data, where the original feature space is an original feature data storage structure of...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a training data processing method and apparatus. The method comprises the steps of: obtaining training data and determining original characteristic space of the training data, wherein the original characteristic space is an original characteristic data storage structure of the training data; performing scanning and statistics on the original characteristic space, and establishing a global index according to a statistical result; mapping the training data to actual characteristic space according to the global index, wherein the actual characteristic space is a storage structure constructed according to the actual characteristic data storage position in the original characteristic space; and performing model training by utilizing the training data of the actual characteristic space. According to an embodiment of the invention, the training data is mapped once through the global index; a sparse storage structure is converted into a dense storage structure, so that the consumption of the storage space is greatly reduced, and the network transmission speed and the training updating speed of models are also improved.

Description

technical field [0001] The invention belongs to the technical field of computing, and in particular relates to a training data processing method and device. Background technique [0002] When a user browses a page, the advertising platform estimates the click-through rate of all candidate advertisements based on the user's web browsing or search behavior and the content of the page, and selects the ad with a higher estimated click-through rate for priority delivery. Click-through rate estimation plays an important role in the process of advertising. [0003] At present, the industry mainly uses simple linear models such as logistic regression (LR, Logistic Regression) to model advertising click-through rates. The model solving process is simple and fast, which can prevent over-fitting of data to a certain extent. Due to the large number of features used by the model in the process of training data training and advertising click-through rate estimation, the nominal space ran...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/30
CPCG06F16/1737G06F16/2255
Inventor 李超
Owner SHENZHEN TENCENT COMP SYST CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products