Method for rapidly extracting massive data files in parallel based on memory mapping

A technology of memory mapping and extraction method, which is applied in the field of fast and parallel extraction of large data files based on memory mapping, which can solve the problems of uncertain file content and achieve good scalability, performance improvement, and efficiency improvement

Inactive Publication Date: 2011-11-02
NORTH CHINA UNIVERSITY OF TECHNOLOGY
View PDF4 Cites 15 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0027] (2) Quick extraction and processing of the second type of big data files
[0028] The second type of problem is that when accessing a file, the content of the file to be read or processed cannot be determined at the beginning of the program, and a typical problem is to search for data in the file or to randomly select a certain part of the file. access
The static load balancing method can no longer achieve good load balancing for each processor in the face of such problems

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for rapidly extracting massive data files in parallel based on memory mapping
  • Method for rapidly extracting massive data files in parallel based on memory mapping
  • Method for rapidly extracting massive data files in parallel based on memory mapping

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0093] The present invention strives to provide a universal and efficient solution to the problem of reading large data files through the combination of multi-core technology and memory mapping file technology without increasing the hardware cost. The core problem to be solved is to improve the efficiency of the application when reading and processing large files with a data volume of up to several GB, and break through the efficiency bottleneck of the original memory-mapped file method through the reasonable application of the multi-core environment. At the same time, the solution proposed by the present invention will also solve the generality problem of reading large data files.

[0094] The present invention makes the following adjustments based on the traditional loop mapping method: the loop mapping technique tends to allocate equal tasks to each processor one by one, that is, the task units allocated each time are basically equal. The present invention divides tasks bas...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method for rapidly extracting massive data files in parallel based on memory mapping. The method comprises the following steps of: generating a task domain: forming the task domain by task blocks, wherein the task blocks are elements in the task domain; generating a task pool: performing sub-task domain merger of the elements in the task domain according to a rule of lowcommunication cost, taking a set of the elements in the task domain as the task pool for scheduling tasks, and extracting tasks to be executed by a processor according to the scheduling selection; scheduling the tasks: according to the remaining quantity of the tasks, determining the scheduling particle size of the tasks, extracting the tasks according with requirements from the task pool, and preparing for mapping; and mapping a processor: mapping the extracted tasks to be executed by a currently idle processor. According to the method disclosed by the invention, the multi-nuclear advantagescan be played; the efficiency for an internal memory to map files is increased; the method can be applicable for reading a single file from a massive file, the capacity of which is below 4GB; the reading speed of this kind of files can be effectively increased; and the I/O (Input/Output) throughput rate of a disk file can be increased.

Description

technical field [0001] The invention relates to a data processing technology, in particular to a method for fast and parallel extraction of large data files based on memory mapping. Background technique [0002] With the development of multi-core computers, multi-core PCs can already complete many large-scale computing tasks, and complex calculations are often related to a large number of data files. It is inevitable that applications will process several GB data files at a time. For the processing of such large data files, data reading in storage and auxiliary storage often becomes the bottleneck of improving the running speed of the application, so that the superior hardware performance brought by the multi-core system cannot be well utilized. The existing memory-mapped file technology is all implemented in the traditional single-core environment, and has the disadvantages of low efficiency and poor versatility in the processing of large data files, that is, the ordinary m...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F9/46G06F17/30
Inventor 马礼李敬哲杜春来马东超
Owner NORTH CHINA UNIVERSITY OF TECHNOLOGY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products