Method and system for parallel batch importing of data into read-only query system

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A read-only query and query system technology, applied in the field of cloud computing, can solve problems such as low efficiency, large consumption of CPU and memory resources, and impact on online system service throughput, so as to enhance service capabilities, reduce switching time, and avoid computing The effect of resource consumption

Inactive Publication Date: 2013-09-11

PEKING UNIV

View PDF4 Cites 10 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

This mode is simple and reliable, but there are two problems in the process of data import: 1. Generally, the data is copied serially through the read-write interface of the online system. At a certain point in time, the online system has only a limited The node is performing storage operations, and the efficiency is low; 2. After the serial copy is completed, the online system will create an index and create a copy of the new data. This process consumes a lot of CPU and memory resources and affects the online system. service throughput

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0059] The original data of 256MB-1280MB is processed by Cassandra and MySQL respectively, and the timing starts. The data after the data is completely imported back to the Cassandra and MySQL online systems and provides services.

[0060] Such as Figure 6 Shown is a schematic diagram of the result comparison of Cassandra and MySQL as the online system data size and processing time in an embodiment of the method of data parallel batch import into the read-only query system of the present invention. The Cassandra system deployed with the present invention takes less time to process and import data than MySQL at various data scales, and because of parallel processing, the problem of MySQL processing speed reduction due to data accumulation does not occur.

Embodiment 2

[0062] On the basis of Embodiment 1, the running time of the online system under high load conditions is counted (it is considered that the CPU load is above 80% as high load conditions).

[0063] Such as Figure 7 Shown is a schematic diagram of the result comparison of Cassandra and MySQL as the online system data size and CPU high load time in an embodiment of the method of data parallel batch import into the read-only query system of the present invention. Cassandra's high load ratio is less than 20%. According to the experimental analysis, the high load mainly occurs in the parallel data import phase, and 20% of the time has little impact on the online system; because MySQL uses the online system to build indexes, the system has a high load. The ratio is higher than 80%, that is, the entire data processing stage has a relatively large impact on the online system.

[0064] The above embodiments illustrate that the present invention improves the efficiency of the entire da...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention relates to a method and a system for parallel batch importing of data into a read-only query system. The system comprises a coordinator, a Hadoop drive program module, and an online query system drive module. The method includes: 1, receiving a request for original characteristic data, and forwarding the request to the Hadoop for processing; 2, establishing a Map / Reduce task on the Hadoop according to node mapping rules, and generating data blocks of the characteristic data at each reducer task node; and 3, placing the data blocks at the corresponding Hadoop nodes, and parallelly reading the data blocks on the corresponding nodes by the read-only query system. The method and the system have the advantages that the characteristic data is processed through a parallel frame of the Hadoop, so that consumption of online system computing resources by index calculation and copy calculation during the process of importing from an offline system to an online system; the data blocks formed by the Hadoop are parallelly and asynchronously read to the online system from the background, so that the influence on the online system is reduced.

Description

technical field [0001] The invention relates to a data import method, in particular to a method for efficiently importing offline data into an online read-only query system in batches, and belongs to the field of cloud computing. Background technique [0002] Data on the Internet is growing explosively, and behind these data there may be associations that are not easy to find, which require data mining to find out. For example, there is a huge amount of information hidden behind the massive amounts of data in social networking and e-commerce sites. In social networks, "friends you may know" is a typical application scenario. The application can calculate the intimacy of two people based on the number of direct mutual friends and the number of second-hop friends between two people, and recommend friends ; In the e-commerce website, the application can calculate the customer's shopping pattern and preferences according to the type, style, price and other factors of the items ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06F17/30G06F9/44

Inventor 申林薛继龙杨智代亚非

Owner PEKING UNIV

Method and system for parallel batch importing of data into read-only query system

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment 2

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology