Sample serialization method and apparatus

A serialization and sample technology, applied in the field of machine training, can solve the problems of long serialization time and insufficient loading of single-machine memory

Active Publication Date: 2017-09-19
ZHEJIANG TMALL TECH CO LTD
View PDF12 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0010] However, during the use of the inventor, it was found that when there are too many elements in the string set, the memory of a single machine cannot be loaded, and the time to serialize the sample data is very long. For example, when there are 2 billion strings, each machine needs to load a complete mapping. The table, the memory exceeds 40G, and the serialization time is also very long

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Sample serialization method and apparatus
  • Sample serialization method and apparatus
  • Sample serialization method and apparatus

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0043] refer to figure 1 , which shows a flow chart of the steps of an embodiment of a sample serialization method of the present application, which may specifically include the following steps:

[0044] Step 110, obtaining each character string in the sample to be serialized;

[0045] In the embodiment of this application, the serialization server first receives the sample data to be serialized. In a preferred embodiment, before step 110, it also includes:

[0046] Step S100, obtaining each sample data to be serialized;

[0047] This application embodiment can have one or more serialization server slaves. Each serialization server can obtain a batch of sample data processed by the serialization server according to the notification of the scheduling server coordinator.

[0048] In the embodiment of this application, each serialization server, each management server, and scheduling server can form a training cluster for machine training.

[0049] In another preferred embodi...

Embodiment 2

[0090] refer to figure 2 , which shows a flow chart of the steps of an embodiment of a sample serialization method of the present application, which may specifically include the following steps:

[0091] Step 210, receiving a character string; the character string is sent by the serialization server according to the correspondence between the character string and each management server; the character string is obtained from the sample data by the sequence server;

[0092] In this embodiment of the application, each management server receives the character string sent by one or several serialization servers.

[0093] In this embodiment of the application, on the serialization server side, for the sample data to be serialized, a character string can be extracted from it, and then the management server can be determined according to the correspondence between the character string and each management server, and then the character string can be sent to The management server.

...

Embodiment 3

[0130] refer to image 3 , shows a flow chart of steps of a preferred embodiment of a sample serialization method in the present application.

[0131] In order to describe the serialization method more clearly in this embodiment, the description is made from the perspective of the overall architecture of the scheduling server, the serialization server, and the management server.

[0132] In the embodiment of the present application, the scheduling server and the serialization server may be combined to create a mapping table for each management server. Such as step S30-step S38.

[0133] In step S32, the scheduling server distributes all sample data equally, and notifies each serialization server to acquire a batch of sample data belonging to each serialization server according to the distribution result.

[0134] Before the entire training starts, after the scheduling server obtains the identification information of all sample data, it can evenly distribute all sample data. ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

Embodiments of the invention provide a sample serialization method and apparatus, and relate to the technical field of machine training. The method comprises the steps of obtaining character strings in to-be-serialized samples; according to corresponding relationships between the character strings and management servers, determining the management servers corresponding to the character strings; sending the character strings to the corresponding management servers, thereby enabling the management servers to convert the received character strings into corresponding serialization IDs according to mapping tables maintained by the management servers, wherein the character strings in the mapping tables maintained by different management servers are different; receiving the serialization IDs corresponding to the character strings and returned by the management servers; and according to the received serialization IDs corresponding to the character strings, converting character strings in sample data into corresponding serialization IDs. According to the method and the apparatus, the query time of the serialization IDs of the character strings is shortened, so that the sample serialization time can be shortened and the serialization efficiency can be improved.

Description

technical field [0001] The present application relates to the technical field of machine training, in particular to a sample serialization method and a sample serialization device. Background technique [0002] In the Internet, a large amount of data can be generated based on user's network behavior, and in order to study various behavior habits of users, various models may be constructed, and in order to train these models, machine learning systems are generally used. In the machine learning system, because the character strings of each dimension in the sample data may not be serialized IDs, for example, they are not digital IDs, but are named according to business requirements. Then, if the strings of the sample data are directly trained, the calculation amount is relatively large and the resource consumption is high. [0003] Therefore, in order to reduce the amount of calculation, before training, it is necessary to convert the strings in all sample data into serialized...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/22H04L29/08
CPCH04L67/10G06F40/151G06F40/12H04L65/40
Inventor 周俊
Owner ZHEJIANG TMALL TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products