Data index establishment method and device, data retrieval method and device, equipment and storage medium

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
A technology for data indexing and establishing methods, applied in the field of data processing, can solve the problems of affecting retrieval accuracy, long retrieval time, exponentially increasing retrieval difficulty, etc. Effect

Pending Publication Date: 2020-02-21

PING AN TECH (SHENZHEN) CO LTD

View PDF0 Cites 4 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0002] The existing K-means clustering algorithm (that is, the K-means algorithm) is easily affected by the initial point. When the initial point is not selected correctly, the correct clustering result cannot be obtained, and the stability of the algorithm is poor. The class algorithm is to obtain quantitative results by clustering the entire vector once, and the amount of cluster center data to be expressed is large

[0003] However, the eigenvector retrieval method based on the above-mentioned K-means clustering algorithm, regardless of the amount of data, directly obtains the rough cluster centers through training once, which is a static learning process. This eigenvector retrieval method cannot be adapted to massive eigenvector data retrieval. , when the cluster center is not selected properly, the result of 1vN retrieval (finding the eigenvector most similar to the query vector from N eigenvectors) has a large deviation, which affects the retrieval accuracy and low retrieval efficiency; at the same time, with the retrieval Objects (such as unstructured data such as images, videos, and music) become more complex, and the difficulty of retrieval increases exponentially. Complex retrieval objects usually need to be represented by multi-dimensional vectors. When there is a large amount of data in the retrieval database, if Brute force search is used to perform multi-dimensional vector calculations on retrieval samples (including multiple comparison objects). The amount of calculation is very large, resulting in too long retrieval time, making it difficult to meet user needs.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0038] The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are some of the embodiments of the present invention, but not all of them. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without creative efforts fall within the protection scope of the present invention.

[0039] The cluster retrieval method of data samples provided by the present invention can be applied in such as figure 1 An application environment in which a client communicates with a server over a network. Among them, clients include but are not limited to various personal computers, notebook computers, smart phones, tablet computers, cameras and portable wearable devices. The server can be implemented by an independent server or a server cluster c...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a data index establishment method and device, a data retrieval method and device, equipment and a storage medium. The method comprises: in a data index establishment process, firstly, according to data sets of different data size levels, selecting different segmented clustering models to perform primary clustering on the data samples in the data set to obtain different first-class clustering centers, performing secondary clustering by using a quantizer associated with the first-class clustering centers to obtain different second-class clustering centers, and obtaining an index table based on the different second-class clustering centers; and in the data retrieval process, performing image data retrieval by utilizing the index table obtained in the data index establishment process. According to the method, massive sample data is segmented and clustered for multiple times in advance, and indexes are established, so that the clustering effect and the precision of aclustering center are improved. Meanwhile, in the data retrieval process, high-precision and high-efficiency image data retrieval is realized based on a pre-established index.

Description

technical field [0001] The invention relates to the field of data processing, in particular to a data index establishment and data retrieval method, device, equipment and storage medium. Background technique [0002] The existing K-means clustering algorithm (that is, the K-means algorithm) is easily affected by the initial point. When the initial point is not selected correctly, the correct clustering result cannot be obtained, and the stability of the algorithm is poor. The class algorithm is to obtain quantitative results by clustering the entire vector once, and the amount of cluster center data to be expressed is large. [0003] However, the eigenvector retrieval method based on the above-mentioned K-means clustering algorithm, regardless of the amount of data, directly obtains the rough cluster centers through training once, which is a static learning process. This eigenvector retrieval method cannot be adapted to massive eigenvector data retrieval. , when the cluster...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06F16/51G06F16/53G06F16/55

CPCG06F16/51G06F16/53G06F16/55

Inventor张艳孙太武周超勇刘玉宇

OwnerPING AN TECH (SHENZHEN) CO LTD

Data index establishment method and device, data retrieval method and device, equipment and storage medium

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements:Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology