A big data file rapid parallel extraction method based on memory mapping

A technology of memory mapping and extraction method, which is applied in the field of fast parallel extraction of large data files based on memory mapping, can solve the problem that the content of the file cannot be determined, and achieve the effects of good scalability, performance improvement and efficiency improvement.

Inactive Publication Date: 2019-05-28
苏州华必讯信息科技有限公司
View PDF1 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0027] (2) Quick extraction and processing of the second type of big data files
[0028] The second type of problem is that when accessing a file, the content of the file to be read or processed cannot be determined at the beginning of th...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A big data file rapid parallel extraction method based on memory mapping
  • A big data file rapid parallel extraction method based on memory mapping
  • A big data file rapid parallel extraction method based on memory mapping

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0093] The present invention strives to propose a universal and efficient solution to the problem of reading large data files through the combination of multi-core technology and memory mapping file technology without increasing the hardware cost. The core problem to be solved is to improve the efficiency of the application program in reading and processing large files with a data volume of several GB, and to break through the efficiency bottleneck of the original memory-mapped file method through the reasonable application of the multi-core environment. At the same time, the solution proposed by the present invention will also solve the generality problem of reading large data files.

[0094] The present invention makes the following adjustments based on the traditional loop mapping method: the loop mapping technique tends to allocate equal tasks to each processor one by one, that is, the task units allocated each time are basically equal. The present invention divides tasks ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a big data file rapid parallel extraction method based on memory mapping, which comprises the following steps of generating a task domain, and forming the task domain by task blocks, wherein the task blocks are elements in the task domain; generating a task pool, carrying out sub-task domain merging on elements in the task domain according to the principle of low communication cost, taking a set of elements in the task domain as a task pool for task scheduling, and extracting tasks for execution by a processor according to scheduling selection; scheduling task, decidingthe scheduling granularity of the tasks according to the residual quantity of the tasks, extracting the tasks meeting the requirements out of the task pool, and preparing for mapping; and mapping theprocessor, mapping the extracted task to the current idle processor for execution. According to the method, the multi-core advantage can be exerted, the file mapping efficiency of the memory is improved, the method can be applied to the large file reading of a single file with the capacity below 4GB, the reading speed of the files can be effectively increased, and the I/O throughput rate of the magnetic disk files is increased.

Description

technical field [0001] The present invention relates to a kind of data processing technique, specifically, relate to a kind of fast parallel extraction method of big data file based on memory mapping. Background technique [0002] With the development of multi-core computers, multi-core PCs can already complete many large-scale computing tasks, and complex calculations are often related to a large number of data files. It is inevitable that applications will process several GB data files at a time. Currently, For the processing of such large data files, data reading in storage and auxiliary storage often becomes the bottleneck of improving the running speed of the application, so that the superior hardware performance brought by the multi-core system cannot be well utilized. The existing memory-mapped file technology is all implemented in the traditional single-core environment, and has the disadvantages of low efficiency and poor versatility in the processing of large data ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F16/23H04L29/08
Inventor 赵乔
Owner 苏州华必讯信息科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products