Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Data processing method and device

A data processing and data processing technology, applied in the field of data processing, can solve the problems of high number of rows or columns, waste of storage space, and few effective features

Active Publication Date: 2018-11-13
BEIJING SOGOU TECHNOLOGY DEVELOPMENT CO LTD
View PDF8 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, some data to be processed is sparse data, that is, there are many feature dimensions in the data, but there are not many effective features, and there are a lot of null values
[0004] As a result, if this kind of sparse data is to be stored in a two-dimensional matrix storage format, the feature dimension of the data to be processed leads to a very high number of rows or columns in the two-dimensional matrix corresponding to the data to be processed. Dimensional matrix wastes a lot of storage space and increases processing costs
However, if other storage formats are used to store sparse data, it will greatly increase the difficulty and complexity of subsequent data processing and learning.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data processing method and device
  • Data processing method and device
  • Data processing method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0069] Embodiments of the present application are described below in conjunction with the accompanying drawings.

[0070] The machine learning platform needs to store the data to be processed before learning the data to be processed. In order to facilitate subsequent processing and learning, the storage format of the data to be processed is usually a two-dimensional table, that is, the data to be processed is usually stored in a two-dimensional matrix storage format. storage. However, some data to be processed is sparse data, that is, there are many feature dimensions in the data, but there are not many effective features, and there are a lot of null values. As a result, if this kind of sparse data is to be stored in a two-dimensional matrix storage format, the feature dimension of the data to be processed leads to a very high number of rows or columns in the two-dimensional matrix corresponding to the data to be processed. Dimensional matrices waste a lot of storage space an...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The embodiment of the application discloses a data processing method. The method comprises the following steps: determining a predetermined number of features according to the number of valid featuresincluded in each sample in data to be processed, wherein the data to be processed are sparse data including multiple samples, and the predetermined number of features is the maximum number of valid features included in any sample in a two-dimensional matrix corresponding to the data to be processed; and constructing a two-dimensional matrix according to the valid features in the data to be processed, wherein a target valid feature is in the row or column of a target sample in the two-dimensional matrix, the target valid feature carries a dimension identifier for identifying the feature dimension of the target valid feature in the data to be processed, the target sample is any sample in the data to be processed, and the target valid feature is any valid feature included in the target sample. Through the technical scheme of the application, null values in sparse data can be eliminated effectively; the number of rows or columns of the two-dimensional matrix corresponding to the data to be processed is reduced greatly; and the storage cost is lowered.

Description

technical field [0001] The present application relates to the field of data processing, in particular to a data processing method and device. Background technique [0002] With the development of data processing technology, machines can process and learn massive amounts of data. Currently, there are various machine learning platforms that process and learn data, such as the MXNet platform. [0003] The machine learning platform needs to store the data to be processed before learning the data to be processed. The storage format that is more suitable for subsequent processing and learning is a two-dimensional table or a two-dimensional matrix storage format. However, some data to be processed is sparse data, that is, there are many feature dimensions in the data, but there are not many effective features, and there are a lot of null values. [0004] As a result, if this kind of sparse data is to be stored in a two-dimensional matrix storage format, the feature dimension of t...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30
Inventor 王潇
Owner BEIJING SOGOU TECHNOLOGY DEVELOPMENT CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products