Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Numerical value and text mixed inverted index algorithm based on multilayer-optimization balanced tree

An inverted index, balanced tree technology, applied in the field of indexing algorithms, can solve problems such as low performance, inability to compare numerical ranges and accurate comparisons

Inactive Publication Date: 2012-04-04
ZHEJIANG TIANYU INFORMATION TECH
View PDF0 Cites 10 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, in this method, due to the change of the actual storage type of the value, the query condition can only be compared based on the text representation of the value, and the range comparison and precise comparison of the value cannot be performed, and the performance of the text comparison is significantly lower than that of the value comparison. performance

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Numerical value and text mixed inverted index algorithm based on multilayer-optimization balanced tree
  • Numerical value and text mixed inverted index algorithm based on multilayer-optimization balanced tree
  • Numerical value and text mixed inverted index algorithm based on multilayer-optimization balanced tree

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0036] This section describes the specific implementation steps of the numerical and text hybrid inverted index algorithm based on multi-level optimized balanced trees.

[0037] 1. Index construction steps

[0038] (1). Creation of layer 0 nodes

[0039] a. For each pair in document set N, sort by value, let V be the set of all unique values

[0040] b. Divide the list sorted in step a into b blocks, each of length N / b. Store the minimum value (first value) and maximum value (last value) of each block into the array T min and T max .

[0041] c. Create an inverted list for each block, sorted by document number. Satisfy: for 1<=i<j<=b, all values ​​in block i are not greater than all values ​​in block j.

[0042] The creation performance of the 0th layer node depends on step a, the time complexity is: O(NlogN), and the space complexity is: O(N).

[0043] (2).Creation of additional layers

[0044] a. For each pair of in document set N, sort by document number and put i...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a numerical value and text mixed inverted index algorithm based on a multilayer-optimization balanced tree. The traditional text inverted index only supports indexing on free texts, wherein the content of a numerical value is generally converted into a text, and then an inverted index is established according to the content of the text. According to the invention, the traditional text inverted index algorithm is optimized and expanded so as to be capable of supporting the indexing on numerical value and text mixed data, and a reasonable balance is realized in the aspects of query performance, index space and establishing performance. The index algorithm disclosed by the invention is suitable for being used in a mixed type data management engine and can be used for increasing the composite query performance of the numerical value and text mixed type data.

Description

technical field [0001] The invention relates to the fields of information retrieval and database management systems, in particular to an indexing algorithm in a numerical and text mixed data management system. Background technique [0002] Database management systems and search engine technologies originally originated from mutually independent application requirements, but as the proportion of unstructured data in practical application-oriented data is increasing, database technology and search engine technologies are tending to merge. The database system uses B+ tree index for numeric fields to improve query performance, while the full-text retrieval system uses inverted index to improve full-text query performance, but the traditional inverted index structure is only applicable to text. However, the search requirements of database systems for text fields and the query requirements of search engine systems for numerical fields are increasing day by day. [0003] The gener...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 郑益
Owner ZHEJIANG TIANYU INFORMATION TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products