Data processing method and apparatus

A data processing and data processor technology, applied in the field of data processing, can solve the problems of consuming storage space, difficult association analysis of structured data and unstructured data, data inconsistency, etc., to avoid data inconsistency and improve scalability. Effect

Pending Publication Date: 2017-09-01
EMC IP HLDG CO LLC
View PDF3 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

First of all, it is difficult to carry out the association analysis between structured data and unstructured data, which needs complex extract-transform-load (ETL) operations on unstructured data before proceeding; Extracting and storing the text data contained in unstructured data offline may cause data inconsistency and consume more storage space

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data processing method and apparatus
  • Data processing method and apparatus
  • Data processing method and apparatus

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0034] The principles of the disclosure will be described below with reference to several example embodiments illustrated in the accompanying drawings. It should be understood that these embodiments are described only to enable those skilled in the art to better understand and implement the present disclosure, but not to limit the scope of the present disclosure in any way.

[0035] figure 1 A block diagram of an exemplary computer system / server 12 suitable for implementing embodiments of the present disclosure is shown. figure 1 The computer system / server 12 shown is merely an example and should not place any limitation on the functionality and scope of use of the embodiments of the present disclosure.

[0036] like figure 1 As shown, computer system / server 12 takes the form of a general purpose computing device. Components of computer system / server 12 may include, but are not limited to, one or more processors or processing units 16, system memory 28, bus 18 connecting va...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

Embodiments of the invention provide a data processing method and apparatus. The method comprises the steps of receiving a data loading request from a data processor; in response to the operation of receiving the data loading request, obtaining requested original data from a data memory; in response to the condition that the original data is non-structured data, extracting corresponding text data from the original data by utilizing a text extractor associated with a file type of the original data; and sending the text data to the data processor. According to the method and the apparatus, structured data and the non-structured data can be processed by adopting a unified process; text information contained in the non-structured data can be extracted in real time; associative analysis of texts and the non-structured data can be conveniently carried out in a same analysis task; the problem of data inconsistency possibly caused by offline processing is avoided; and in addition, the non-structured data of various file types can be supported through a plug-in mechanism, so that the data processing expandability is improved.

Description

technical field [0001] Embodiments of the present disclosure generally relate to data processing, and more particularly, to a method and apparatus for data processing. Background technique [0002] Currently, enterprises typically build data lakes to hold their vast amounts of data, which often includes both structured and unstructured data. For example, structured data can include plain text files, JavaScript Object Notation (JSON) files, comma-separated value (CSV) files, database files, and object files, among others. Unstructured data may generally include Rich Text Format (RTF) files such as Word documents, Portable Document Format (PDF) documents and presentation documents, and multimedia files such as audio files and video files, and multimedia files. The data processing and analysis processes for these two types of data are usually different. Currently, popular big data processing frameworks, such as Hadoop, Spark, Hive, multi-physical partition (MPP) databases, et...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/182G06F16/254G06F16/313G06F16/116
Inventor 郭小燕陈超曹逾周旻弘薛丁萌
Owner EMC IP HLDG CO LLC
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products