Extracting statistical method and system aiming at semi-structured big data

A semi-structured and statistical method technology, applied in the direction of semi-structured data retrieval, semi-structured data mapping/conversion, redundant data error detection in operations, etc., can solve the cumbersome process, visual defects, and cannot be automated Handling and other issues to achieve the effect of reducing data redundancy, simple and reliable operation and maintenance, and simple and reliable operation and maintenance

Active Publication Date: 2017-09-12
北京思特奇信息技术股份有限公司
View PDF5 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the whole process is cumbersome and needs to go through processes such as configuration (different business scenarios), execution, and viewing. It cannot be automatically processed after executing commands like SQL queries. At the same time, there are also defects in the visualization of the execution process and running results.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Extracting statistical method and system aiming at semi-structured big data
  • Extracting statistical method and system aiming at semi-structured big data

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0041] Such as figure 1 As shown, an extraction statistics method for semi-structured big data includes the following steps:

[0042] S1, the client receives the operation statement input by the user for extracting and counting semi-structured big data, and synchronizes the operation statement to the parsing and conversion module for processing;

[0043] S2. The parsing and conversion module receives the operation statement, parses the operation statement, and converts the parsing result into a configuration rule;

[0044] S3, the client calls the application engine module to generate job tasks according to the configuration rules, and submits the job tasks to the underlying framework for processing;

[0045]S4, the underlying framework splits the job task into multiple subtasks and assigns them to the cluster for execution, and returns the result data obtained after execution to the client for display.

[0046] Specifically, the client is responsible for user interaction. T...

Embodiment 2

[0063] Such as figure 2 As shown, an extractive statistics system for semi-structured big data, including:

[0064] The client module is used to receive the operation statement input by the user for extracting and counting semi-structured big data, and synchronize the operation statement to the parsing and conversion module, and call the application after the parsing and conversion module completes the parsing and conversion of the operation statement The engine module acquires the result data, and displays the acquired result data after the application engine module acquires the result data;

[0065] A parsing and conversion module, configured to receive the operation statement, analyze the operation statement and convert the analysis result into a configuration rule;

[0066] The application engine module is used to generate job tasks according to the configuration rules generated by the parsing and conversion module after receiving the call from the client module, submit ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to an extracting statistical method and system aiming at semi-structured big data, and belongs to the field of big data extracting statistics. The extracting statistical method and system solve the problem that the process of extraction and statistics of the semi-structured big data is tedious and easy to cause data redundancy. According to the extracting statistical method and system, by providing a client to let a user input operational statements of the extraction and statistics aiming at the semi-structured big data, the operation statements are synchronized to a parsing conversion module which parses the operation statements and converts parsing results into configuration rules; the client calls an application engine module to generate a job task according to the configuration rules, and submit the job task to a underlying framework; the underlying framework splits the job task into multiple subtasks, distributes the subtasks to a cluster for execution, and returns the resulting data obtained after the execution back to the client to show to the user. The extracting statistical method and system are used for improving the maintainability and the automatic visualization level of the extraction and statistics aiming at the semi-structured large data and reducing the data redundancy, and the extracting statistical method and system are simple and reliable.

Description

technical field [0001] The present invention relates to the field of big data extraction and statistics, in particular to an extraction and statistics method and system for semi-structured big data. Background technique [0002] Commonly used big data analysis components, such as HIVE (a data warehouse tool), require the model that the fields to be counted must be separate columns. However, in actual demand, data also has specific requirements in terms of business, and it needs to be a semi-structured data model mode. It is necessary to meet both business needs and statistical needs, and there is a big conflict in the coexistence of the same model. Therefore, the general analysis component loads business data separately into a specific data warehouse for processing. This actually leads to data redundancy. On the basis of the same model (semi-structured), it is also possible to develop independent extractive statistics tools for processing. However, the whole process is c...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30G06F11/14
CPCG06F11/1448G06F16/84
Inventor 方辉盛
Owner 北京思特奇信息技术股份有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products