Method and system for parallel batch importing of data into read-only query system

A read-only query and query system technology, applied in the field of cloud computing, can solve problems such as low efficiency, large consumption of CPU and memory resources, and impact on online system service throughput, so as to enhance service capabilities, reduce switching time, and avoid computing The effect of resource consumption

Inactive Publication Date: 2013-09-11
PEKING UNIV
View PDF4 Cites 10 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

This mode is simple and reliable, but there are two problems in the process of data import: 1. Generally, the data is copied serially through the read-write interface of the online system. At a certain point in time, the online system has only a limited The node is perfo

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and system for parallel batch importing of data into read-only query system
  • Method and system for parallel batch importing of data into read-only query system
  • Method and system for parallel batch importing of data into read-only query system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0059] The original data of 256MB-1280MB is processed by Cassandra and MySQL respectively, and the timing starts. The data after the data is completely imported back to the Cassandra and MySQL online systems and provides services.

[0060] Such as Figure 6 Shown is a schematic diagram of the result comparison of Cassandra and MySQL as the online system data size and processing time in an embodiment of the method of data parallel batch import into the read-only query system of the present invention. The Cassandra system deployed with the present invention takes less time to process and import data than MySQL at various data scales, and because of parallel processing, the problem of MySQL processing speed reduction due to data accumulation does not occur.

Embodiment 2

[0062] On the basis of Embodiment 1, the running time of the online system under high load conditions is counted (it is considered that the CPU load is above 80% as high load conditions).

[0063] Such as Figure 7 Shown is a schematic diagram of the result comparison of Cassandra and MySQL as the online system data size and CPU high load time in an embodiment of the method of data parallel batch import into the read-only query system of the present invention. Cassandra's high load ratio is less than 20%. According to the experimental analysis, the high load mainly occurs in the parallel data import phase, and 20% of the time has little impact on the online system; because MySQL uses the online system to build indexes, the system has a high load. The ratio is higher than 80%, that is, the entire data processing stage has a relatively large impact on the online system.

[0064] The above embodiments illustrate that the present invention improves the efficiency of the entire da...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a method and a system for parallel batch importing of data into a read-only query system. The system comprises a coordinator, a Hadoop drive program module, and an online query system drive module. The method includes: 1, receiving a request for original characteristic data, and forwarding the request to the Hadoop for processing; 2, establishing a Map/Reduce task on the Hadoop according to node mapping rules, and generating data blocks of the characteristic data at each reducer task node; and 3, placing the data blocks at the corresponding Hadoop nodes, and parallelly reading the data blocks on the corresponding nodes by the read-only query system. The method and the system have the advantages that the characteristic data is processed through a parallel frame of the Hadoop, so that consumption of online system computing resources by index calculation and copy calculation during the process of importing from an offline system to an online system; the data blocks formed by the Hadoop are parallelly and asynchronously read to the online system from the background, so that the influence on the online system is reduced.

Description

technical field [0001] The invention relates to a data import method, in particular to a method for efficiently importing offline data into an online read-only query system in batches, and belongs to the field of cloud computing. Background technique [0002] Data on the Internet is growing explosively, and behind these data there may be associations that are not easy to find, which require data mining to find out. For example, there is a huge amount of information hidden behind the massive amounts of data in social networking and e-commerce sites. In social networks, "friends you may know" is a typical application scenario. The application can calculate the intimacy of two people based on the number of direct mutual friends and the number of second-hop friends between two people, and recommend friends ; In the e-commerce website, the application can calculate the customer's shopping pattern and preferences according to the type, style, price and other factors of the items ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30G06F9/44
Inventor 申林薛继龙杨智代亚非
Owner PEKING UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products