RDD partition internal data index establishing method, click checking method and joinRDD click checking method

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology for establishing internal data and indexes, applied in database indexing, structured data retrieval, digital data information retrieval, etc., can solve problems such as poor performance of lookupAPI, improve query efficiency, prevent OOM, and improve query efficiency

Inactive Publication Date: 2020-06-19

INSPUR SUZHOU INTELLIGENT TECH CO LTD

View PDF0 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0007] In order to solve the above problems, the present invention provides an RDD partition internal data index establishment method, an RDD check method and a join RDD check method. By building an index for the internal data of the RDD Partition, the problem of poor performance of Spark's native lookup API is solved. Achieve the technical effect of improving query efficiency, and avoid the actual join of RDD, effectively preventing the occurrence of OOM

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0038] This embodiment provides a method for establishing an internal data index of an RDD partition. The index constructed for the RDD is an Array whose element type is HashMap, which corresponds to a partition one by one, and each HashMap stores the internal data index information of the corresponding partition.

[0039] What needs to be explained in this embodiment is that the data type of the RDD is (K, V). Before indexing, it is first possible to determine whether the RDD has a partitioner, and if there is a partitioner, perform subsequent steps. That is, ensure that the RDD has a partitioner, so as to ensure that elements with the same key value in the RDD will be in the same partition.

[0040] Such as figure 1 with 2 As shown, this method to establish partition internal data specifically includes the following steps:

[0041] S1-1, define an Array that stores the internal data index of the partition. The elements of the Array are of HashMap type, and a HashMap corres...

Embodiment 2

[0050] Based on the index established in Embodiment 1, this embodiment provides an RDD counting method, which uses the index search to obtain the partition index according to the partition information, and then obtains the position of the data in the partition according to the index, and finally obtains the data.

[0051] Such as image 3 As shown, the method specifically includes the following steps:

[0052] S2-1, obtain the index information of the partition where the key to be searched is located according to the partitioner of the partition;

[0053] S2-2, according to the index information of the partition, obtain the partition internal data index (ie a HashMap) corresponding to the partition from the RDD index;

[0054] S2-3, call the apply method of HashMap to obtain the position pos of the key to be found in the partition, pos is an ArrayBuffer;

[0055] S2-4, call the slice method of the partition iterator according to the pos information, obtain the slice data of ...

Embodiment 3

[0059] Based on the first and second embodiments above, this embodiment provides a join RDD enumeration method, using the index established by the method in the first embodiment and the enumeration method in the second embodiment to search the RDD after natural connection.

[0060] Such as Figure 4 As shown, the method specifically includes the following steps:

[0061] S3-1, call the method of Embodiment 1 for the two RDDs that need to be joined, and construct the corresponding RDD index;

[0062] S3-2, calling the method of Embodiment 2 on the two RDDs to find the value value that meets the conditions;

[0063] S3-3, combine the query results of the two RDDs and return the results in the form of join data.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses an RDD partition internal data index establishing method, an RDD point check method and a join RDD point check method. The method includes: establishing an index for the internal data of the RDD Partition; hashMap is used for storing the position information of each piece of data in the partition; the indexes of all Partions are combined with the indexes of the journey RDD;the method comprises the following steps: searching a key; all data in the partition does not need to be traversed; instead, the position of the key in the partition is directly found through the HashMap, and then the corresponding value is directly obtained from the specific position of the partition by utilizing the slice interface of the partition Iterator. The problem that the performance ofthe Spark native lookup API is poor is solved, and the technical effect of improving the query efficiency is achieved. In addition, the actual join of the RDDs can be avoided by creating indexes for the two RDDs needing join and then executing query on the indexes, OOM can be effectively prevented, and the query efficiency is improved.

Description

technical field [0001] The invention relates to the field of RDD indexing, in particular to a method for establishing an RDD partition internal data index, an RDD enumeration method, and a join RDD enumeration method. Background technique [0002] With the development of big data processing, the requirements for processing speed are getting higher and higher. Traditional distributed big data processing platforms based on disk storage are getting more and more difficult when dealing with big data processing, especially data processing such as machine learning and iterative operations. More and more powerless. In-memory computing technology emerged as the times require. In-memory computing is based on memory and does not need to frequently save intermediate results to disk during processing, thus avoiding unnecessary I / O overhead. The advantages brought by in-memory computing technology are significant. First of all, it can effectively accelerate the complex analysis and pro...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06F16/22G06F16/2455G06F16/25G06F16/28

CPCG06F16/2228G06F16/2456G06F16/252G06F16/283

Inventor 黄伟

Owner INSPUR SUZHOU INTELLIGENT TECH CO LTD

Features

Generate Ideas
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

RDD partition internal data index establishing method, click checking method and joinRDD click checking method

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment 2

Embodiment 3

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology