Unlock instant, AI-driven research and patent intelligence for your innovation.

Algorithm for quickly matching mass data

A mass data and matching operation technology, applied in the field of data matching, can solve problems such as inaccurate results and achieve the effect of improving accuracy

Pending Publication Date: 2022-03-01
JIANGSU SUNYU INFORMATION TECH CO LTD
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] The purpose of the present invention is to solve the problem of inaccurate results obtained by data matching in the prior art, and propose a fast matching algorithm for massive data

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Algorithm for quickly matching mass data
  • Algorithm for quickly matching mass data
  • Algorithm for quickly matching mass data

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0024] refer to figure 1 , an algorithm for fast matching of massive data, including the following steps:

[0025] S1. HubbleDotNet integrates full-text search and relational database, and performs full-text and relational query on the data in the database through SQL statements;

[0026] S2. Based on the TF-IDF algorithm, the position function fp(t,d,q) is added:

[0027] S3. After obtaining accurate data through HubbleDotNet, the system uses the edit distance algorithm and combines its own specific recursive algorithm to perform matching operations on the data.

[0028] The HubbleDotNet component itself is responsible for the inverted index of the full-text data, and stores the index in the directory specified by the table, and the data storage is completed by the relational database associated with Hubble.net.

[0029] The basic scoring algorithm formula of HubbleDotNet is as follows:

[0030]

[0031] FieldRank is the field weight;

[0032] Rank(t,q) is the weight o...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses an algorithm for quickly matching mass data, which comprises the following steps of: integrating full-text search and a relational database by HubbleDotNet, and performing full-text and relational query on data in the database through SQL (Structured Query Language) statements; a position function fp (t, d, q) is added on the basis of a TF-IDF algorithm, and after accurate data is obtained through HubbleDotNet, a system performs matching operation on the data in combination with a specific recursive algorithm on the basis of adopting an editing distance algorithm. The system has the advantages that on comparison of full-text search database components, through comparison operation of multiple data, the matching accuracy of the HubbleDotNet matching degree is higher than that of similar systems. After the high-precision data are obtained, the system can calculate the accurate matching degree of the data in a distance editing mode, and therefore the purpose of accurately and efficiently calculating the matching value is achieved.

Description

technical field [0001] The invention relates to the technical field of data matching, in particular to an algorithm for quickly matching massive data. Background technique [0002] There are problems in the matching correlation of the current full-text search database components, and the searched data is often not the most desired result. Some systems try to solve the problem through word segmentation. This method will improve the problem, but it cannot solve the problem fundamentally. In the English environment, word segmentation cannot play a role at all. HubbleDotNet's algorithm refers to the algorithm of Lucene and SQLServer and has been greatly improved. Compared with other similar full-text search database components, the matching correlation has been significantly improved. The system will sort the matching data according to the score, and then use the Edit Distance algorithm to calculate the exact matching value. Thus, the matching speed and matching accuracy of ma...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/2455G06F16/242G06F16/2458G06F16/28
CPCG06F16/24553G06F16/2433G06F16/2462G06F16/284
Inventor 胡永伟
Owner JIANGSU SUNYU INFORMATION TECH CO LTD