Data importing method and system for distributed sequence list

A distributed sequence table and data import technology, which is applied in the direction of electrical digital data processing, special data processing applications, instruments, etc., can solve the problems of long data time and low efficiency, and achieve the effect of improving import speed and saving positioning time

Active Publication Date: 2013-05-01
北京东方国信科技股份有限公司
View PDF4 Cites 27 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Due to the large amount of data, when using the interface provided by DOT to import, it is necessary to find the location of the Region fragments step by step from top to bottom, resulting in a long time to import data and low efficiency

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data importing method and system for distributed sequence list
  • Data importing method and system for distributed sequence list
  • Data importing method and system for distributed sequence list

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0029] figure 1 It is a flowchart of the data import method of the distributed sequence table according to the first embodiment of the present invention, such as figure 1 As shown, the data import method of the distributed sequence table includes:

[0030] Step S101: Use the Map function to convert the data to be imported into key-value pairs;

[0031] In each key-value pair, the key is the primary key of the distributed sequence table, and the value is the data content corresponding to the key; the data to be imported can be any form of data, such as text string, binary sequence, etc. The Map function receives the data to be imported and converts it into a number of key-value pairs for output. The key represents the key, and the value is the value, which represents the data content corresponding to the key.

[0032] Step S102: Sort the key-value pairs according to keys;

[0033] All the key-value pairs generated in step S101 are sorted according to the key, so as to ensure the glob...

Embodiment 2

[0045] In order to further improve the data import speed, the first embodiment may be further improved: including: sorting the input key value pairs of each Reduce function and then performing a merge operation; sampling and analyzing the original data. image 3 It is a flow chart of the data import method of the distributed sequence table described in this embodiment, such as image 3 As shown, the data import method of the further improved distributed sequence table includes:

[0046] Step S301, sampling and analyzing the data to be imported;

[0047] In order to fragment the keys in a balanced manner in step S305, and to obtain a relatively balanced load among the last written data storage files, before converting the to-be-imported data into key-value pairs, it may further include: Using a sampling function to sample and analyze the original data, the above method can provide a balanced segmentation interval reference for subsequent steps. For example, in step S305, the key-valu...

Embodiment 3

[0061] Figure 4 It is the structural block diagram of the data import system of the distributed sequence table described in this embodiment, such as Figure 4 As shown, the data import system of the distributed sequence table described in this embodiment includes:

[0062] The key-value pair conversion module 401 is used to convert the data to be imported into key-value pairs using the Map function;

[0063] In each key-value pair, the key is the primary key of the distributed sequence table, and the value is the data content corresponding to the key; the data to be imported can be any form of data, such as text string, binary sequence, etc. The Map function receives the data to be imported and converts it into a number of key-value pairs for output. The key represents the key, and the value is the value, which represents the data content corresponding to the key.

[0064] The sorting module 402 is configured to sort the key-value pairs generated by the key-value pair conversion mo...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a data importing method and a data importing system for a distributed sequence list. The method comprises the following steps: S1, converting a to-be-imported data into key value pairs by using a Map function; S2, sequencing the key value pairs according to keys; S3, fragmenting the sequenced key value pairs and respectively distributing each fragment to a Reduce function; S4, performing format conversion on respective distributed fragment by each Reduce function, converting the fragment into a format required by a bottom data storage file of the distributed sequence list and writing the fragment subjected to format conversion into the bottom data storage file; and S5, loading the bottom data storage file into the distributed sequence list. According to the invention, the to-be-imported data is directly written into the bottom data storage file of the distributed sequence list, so that the fragment positioning time is saved and the importing speed is accelerated.

Description

Technical field [0001] The present invention relates to the technical field of distributed information processing, in particular to a data import method and system of a distributed sequence table. Background technique [0002] As the amount of network application data continues to increase, higher requirements are put forward on the access performance, storage overhead, and reliability of data storage systems. Distributed Ordered Table (DOT) is a database system most suitable for massive data (TB to PB level). Due to the large amount of data, when importing using the interface provided by DOT, it is necessary to find the location of the Region fragments from top to bottom, which leads to a long time for importing data and low efficiency. Summary of the invention [0003] The main purpose of the present invention is to provide a distributed sequence table data import technology based on the Map / Reduce distributed computing framework, which can meet the demand for massive data impo...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 刘佳万浩查礼
Owner 北京东方国信科技股份有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products