Data preprocessing method and equipment and storage medium

A data preprocessing and preprocessing technology, which is applied in electrical digital data processing, program control design, instruments, etc., can solve the problem of non-standardization of preprocessing steps, and achieve the effect of reducing waiting.

Pending Publication Date: 2019-12-17
ZTE CORP
View PDF3 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] In view of this, the purpose of the embodiments of the present invention is to provide a data preprocessing method, device, and storage medium to solve the problem that the various steps of preprocessing are not standardized, and the steps are basically performed manually. Steps, technical issues that also require manual polling to view generated results

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data preprocessing method and equipment and storage medium
  • Data preprocessing method and equipment and storage medium
  • Data preprocessing method and equipment and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0020] Such as figure 1 As shown, a data preprocessing method provided by an embodiment of the present invention includes:

[0021] S101. Monitor the path where the original data is located.

[0022] Specifically, the original data can be the preprocessing of artificial intelligence training data, or it can be big data that needs data cleaning. The data file format can be various, including but not limited to data sources in the form of images, text, or tables.

[0023] S102. After detecting that there is unprocessed raw data, execute the preprocessing script or program corresponding to each step according to the execution order of each step preset in the configuration file.

[0024] Wherein, the preprocessing script or program is implemented by using the same or different programming language, and is used for preprocessing the data under the data input path, and saving the preprocessing result to the data output path.

[0025] Specifically, in the data preprocessing proces...

Embodiment 2

[0031] Such as image 3 As shown, a data preprocessing method provided by an embodiment of the present invention includes:

[0032] S301. Predefine a configuration file for data preprocessing.

[0033] Among them, the preprocessing configuration file is defined according to the actual application scenario. In the configuration file, all steps of data preprocessing and their execution sequence, data input path, data output path, entry script, and subtask script or program corresponding to each step are predefined. The entry script is used to define the execution order of the subtask scripts or programs. The basic content of the entry script is to call each script or program in this step, and judge whether to exit abnormally or continue to the next step according to the return value. Subtask scripts or programs are a series of subtask scripts or programs written according to preset rules to realize step functions. There are no requirements for the number of subtask scripts o...

Embodiment 3

[0049] Such as Figure 4 As shown, the embodiment of the present invention is described by taking face recognition as an example.

[0050] Take the face image preprocessing required for face recognition as an example. One source of face images is the faces of celebrities captured on the Internet. The captured faces belonging to the same person are stored in the same path. The photos below can only be used for face model training after four steps of preprocessing: face detection, face positioning, face calibration, and face deduplication. These four steps have corresponding processing scripts or program, but all are manual operations, the present invention can be applied according to the following steps to be automated.

[0051] In step S401 , the preprocessing steps of the predefined face recognition data are four steps of face detection, face location, face calibration and face weight ranking.

[0052] Step S402, respectively predefine the subtasks of each step, and write t...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The embodiment of the invention discloses a data preprocessing method and equipment and a storage medium, and belongs to the field of data preprocessing. The method comprises the steps of monitoring apath where original data is located; when it is detected that unprocessed original data exists, executing a preprocessing script or program corresponding to each step according to a preset executionsequence of each step in the configuration file, wherein all steps and execution sequences of data preprocessing, data input paths and data output paths corresponding to the steps, and preprocessing scripts or programs are preset in the configuration file. According to the embodiment of the invention, each step is standardized; all the steps are driven by data, the data is read from the data inputpath by the preprocessing script or program, and the generated result is stored in the data output path, so that the method can be suitable for various data and various program scripting languages, meanwhile, a user does not need to poll the execution result of each step, and waiting of the execution result among all the steps is reduced.

Description

technical field [0001] The invention relates to the field of data preprocessing, in particular to an artificial intelligence data preprocessing method, device and storage medium. Background technique [0002] Artificial intelligence model training requires training data from many sources, with various data file formats, data content, and data processing scripts or programs. They must be preprocessed before they can be used for artificial intelligence model training. Different tasks (faces, human figures, vehicles) and different algorithms, such as face recognition, MTCNN (Multi-taskconvolutional neural networks, multi-task convolutional neural networks) need to write different preprocessing scripts, the required preprocessing The processing steps vary, and the scripts take varying lengths of time to run. [0003] At present, data preprocessing focuses on a specific step, focusing on the automatic processing of file formats and different field types. There is no standardizat...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F9/448
CPCG06F9/4482
Inventor 陈小强
Owner ZTE CORP
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products