Index generation method and retrieval method for scientific big data

A data indexing and big data technology, applied in database indexing, structured data retrieval, digital data information retrieval, etc., can solve the problems of low disk access rate and computing speed, improve retrieval efficiency, prevent disk access overhead, The effect of data block size optimization

Active Publication Date: 2019-11-12
SUN YAT SEN UNIV
View PDF5 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

But compared to its calculation speed of hundreds of millions of times per second, it is still relatively slow, and its disk access rate is much lower than the calculation speed of memory.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Index generation method and retrieval method for scientific big data
  • Index generation method and retrieval method for scientific big data
  • Index generation method and retrieval method for scientific big data

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0052] This embodiment provides an index generation method for scientific big data, including the following steps:

[0053] According to the heat of each data block, it is determined that several of the data blocks are hot data blocks;

[0054] Merge the hot data block and the data blocks adjacent to the hot data block according to the continuity of the hot data block;

[0055] Generate data indexes or update original data indexes according to the merged data blocks.

[0056] According to the real-time heat of each data block, the hot data block is determined, and the hot data block is merged with the data blocks adjacent to the hot data block according to the continuity of the hot data block. Specifically: when the hot data block is relatively discontinuous, Merge the hotspot data block with its adjacent non-hotspot data block; when the hotspot data block is relatively continuous, merge the hotspot data block with the non-hotspot data block and hotspot data block adjacent to...

Embodiment 2

[0100] This embodiment provides a search method for scientific big data, including the index generation method for scientific big data as described in Embodiment 1, and also includes the following steps:

[0101] Data retrieval based on data index.

[0102] According to the continuous situation of the hot data block, the hot data block and the data block adjacent to the hot data block are dynamically merged, and the size of the data block can be optimized. According to the data index generated or updated in real time according to the dynamically merged data block, based on the data The index retrieves scientific big data, which can not only prevent the data block from being too large and cause too much redundant information to enter the disk during retrieval and increase the overhead of data filtering, but also prevent the data block from being too small to increase the disk memory access overhead during retrieval , make full use of computer computing resources, and greatly im...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to an index generation method and retrieval method for scientific big data. The method comprises the following steps of determining a plurality of data blocks as hotspot data blocks according to popularity of each data block; combining the hotspot data block with the data block adjacent to the hotspot data block according to the continuous condition of the hotspot data block;and generating a data index or updating an original data index according to the combined data block. According to the method, the situation that excessive redundant information enters the disk duringretrieval due to the fact that the data block is too large, and consequently data filtering expenditure is increased can be prevented, the situation that disk memory access expenditure is increased during retrieval due to the fact that the data block is too small can also be prevented, computing resources of a computer are more fully utilized, and the retrieval efficiency of scientific big data is greatly improved.

Description

technical field [0001] The invention relates to the technical field of data retrieval, and more specifically, to an index generation method and a retrieval method for scientific big data. Background technique [0002] At present, the commonly used methods of data retrieval mainly include bitmap retrieval, B-Tree retrieval, hash retrieval and block index. [0003] When using bitmap retrieval, the index file itself is very large, especially when the cardinality in the data set is large, the size of the index file itself may have exceeded the size of the original data, which leads to a large amount of space occupied when the index is stored, and at the same time in the retrieval process In , when reading data, it is necessary to read a large number of index files, which leads to a decrease in retrieval efficiency and lower utilization efficiency of computing nodes. [0004] B-Tree retrieval is mainly aimed at the application of read-write load balancing, but the current scient...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/22G06F16/23G06F16/2455
CPCG06F16/2228G06F16/23G06F16/2455Y02D10/00
Inventor 卢宇彤沈逸仙杜云飞钟康游郭贵鑫李江杜量曹鹏赵帅帅
Owner SUN YAT SEN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products