Index generation method and retrieval method for scientific big data

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A data indexing and big data technology, applied in database indexing, structured data retrieval, digital data information retrieval, etc., can solve the problems of low disk access rate and computing speed, improve retrieval efficiency, prevent disk access overhead, The effect of data block size optimization

Active Publication Date: 2019-11-12

SUN YAT SEN UNIV

View PDF5 Cites 2 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

But compared to its calculation speed of hundreds of millions of times per second, it is still relatively slow, and its disk access rate is much lower than the calculation speed of memory.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0052] This embodiment provides an index generation method for scientific big data, including the following steps:

[0053] According to the heat of each data block, it is determined that several of the data blocks are hot data blocks;

[0054] Merge the hot data block and the data blocks adjacent to the hot data block according to the continuity of the hot data block;

[0055] Generate data indexes or update original data indexes according to the merged data blocks.

[0056] According to the real-time heat of each data block, the hot data block is determined, and the hot data block is merged with the data blocks adjacent to the hot data block according to the continuity of the hot data block. Specifically: when the hot data block is relatively discontinuous, Merge the hotspot data block with its adjacent non-hotspot data block; when the hotspot data block is relatively continuous, merge the hotspot data block with the non-hotspot data block and hotspot data block adjacent to...

Embodiment 2

[0100] This embodiment provides a search method for scientific big data, including the index generation method for scientific big data as described in Embodiment 1, and also includes the following steps:

[0101] Data retrieval based on data index.

[0102] According to the continuous situation of the hot data block, the hot data block and the data block adjacent to the hot data block are dynamically merged, and the size of the data block can be optimized. According to the data index generated or updated in real time according to the dynamically merged data block, based on the data The index retrieves scientific big data, which can not only prevent the data block from being too large and cause too much redundant information to enter the disk during retrieval and increase the overhead of data filtering, but also prevent the data block from being too small to increase the disk memory access overhead during retrieval , make full use of computer computing resources, and greatly im...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention relates to an index generation method and retrieval method for scientific big data. The method comprises the following steps of determining a plurality of data blocks as hotspot data blocks according to popularity of each data block; combining the hotspot data block with the data block adjacent to the hotspot data block according to the continuous condition of the hotspot data block;and generating a data index or updating an original data index according to the combined data block. According to the method, the situation that excessive redundant information enters the disk duringretrieval due to the fact that the data block is too large, and consequently data filtering expenditure is increased can be prevented, the situation that disk memory access expenditure is increased during retrieval due to the fact that the data block is too small can also be prevented, computing resources of a computer are more fully utilized, and the retrieval efficiency of scientific big data is greatly improved.

Description

technical field [0001] The invention relates to the technical field of data retrieval, and more specifically, to an index generation method and a retrieval method for scientific big data. Background technique [0002] At present, the commonly used methods of data retrieval mainly include bitmap retrieval, B-Tree retrieval, hash retrieval and block index. [0003] When using bitmap retrieval, the index file itself is very large, especially when the cardinality in the data set is large, the size of the index file itself may have exceeded the size of the original data, which leads to a large amount of space occupied when the index is stored, and at the same time in the retrieval process In , when reading data, it is necessary to read a large number of index files, which leads to a decrease in retrieval efficiency and lower utilization efficiency of computing nodes. [0004] B-Tree retrieval is mainly aimed at the application of read-write load balancing, but the current scient...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06F16/22G06F16/23G06F16/2455

CPCG06F16/2228G06F16/23G06F16/2455Y02D10/00

Inventor 卢宇彤沈逸仙杜云飞钟康游郭贵鑫李江杜量曹鹏赵帅帅

Owner SUN YAT SEN UNIV

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Index generation method and retrieval method for scientific big data

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment 2

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology