Data inquiry method and device

A data query device and data stream technology, applied in the information field, can solve the problems of large index occupation space, high filtering threshold lower limit, low support efficiency, etc., and achieve small index occupation space, small candidate string set, and good query fault tolerance. effect of ability

Active Publication Date: 2013-07-24
INST OF INFORMATION ENG CHINESE ACAD OF SCI
View PDF2 Cites 22 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0034] Under the framework of filtering and verification, many studies are based on fixed-length q-gram feature extraction and filtering, and there are three shortcomings as follows:1. For short strings, high-quality features cannot be guaranteed to be extracted. If the value of q is small, it may cause many strings to share features and generate a large number of candidate strings; if the value of q is large, it may be filtered out strings that might resemble
2. If the value of q is small, the lower limit of the filtering threshold is higher and a larger index is introduced; 3. The support for updates is not efficient, especially when prefix filtering is used. When some data is updated, the entire IDF-based global order may change, which will inevitably lead to re-selection of features, rebuilding of indexes, etc.
[0038] The method based on Trie has the following two disadvantages: 1. The efficiency of long strings is low, because the construction process of Trie needs to compare each character in the string with the existing nodes of the Trie tree one by one, so Trie is not efficient for long strings. The insertion and query efficiency is not high
2. Trie takes a long time to preprocess and index the string set, and the index takes up a lot of space
For the data stream environment, it is obviously not feasible to first obtain all the data for indexing. Even if it is possible to build an index on a part of the data on the data stream, the size of the query and the time to build the index are also due to limited memory. And query real-time requirements, there are strict restrictions
Because if the index is too large, it may not be able to fit into the memory; if the index creation time is too long, the data in the data flow sliding window may expire before the index is created
[0051] At present, keyword queries based on data streams are mostly based on exact matching methods, and exact matching has no fault tolerance and cannot deal with errors in strings

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data inquiry method and device
  • Data inquiry method and device
  • Data inquiry method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0090] The principles and features of the present invention are described below in conjunction with the accompanying drawings, and the examples given are only used to explain the present invention, and are not intended to limit the scope of the present invention.

[0091] Figure 4 It is a schematic flowchart of the data query method in the embodiment of the present invention.

[0092] Step 1, receiving the query conditions provided by the user, the query conditions include query keywords, edit distance threshold and sliding window width;

[0093] In addition to query keywords and edit distance thresholds, query conditions can also be query keywords plus a similarity threshold based on a similar function.

[0094] step-to-step Figure 4 Step 1 in.

[0095] Step 2, extract the eigenvalues ​​of the query keywords to form the keyword feature index, and extract the eigenvalues ​​of the basic windows in the current sliding window to form the feature index of the sliding window, ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a data inquiry method and a data inquiry device. The data inquiry method comprises the following steps: receiving inquiry conditions provided by a user, wherein the inquiry conditions comprise inquiry keywords, an edit distance threshold value and a sliding window width; extracting characteristic values of the inquiry keywords to form a keyword characteristic index; extracting characteristics values of basic windows in a current sliding window to form a characteristic index of the sliding window, wherein the sliding window comprises a set number of basic windows and the characteristic index of the sliding window is a queue consisting of characteristic indexes of all the basic windows in the sliding window; when a preset inquiry trigger condition is reached, triggering inquiry for the current sliding window; and according to the keyword characteristic index and the edit distance threshold value, filtering the characteristic index of the current sliding window to obtain a candidate character string set which meets the filtering lower limit. According to the data inquiry method and the data inquiry device, which are disclosed by the invention, the defect of carrying out accurate inquiry of the keywords in a data stream scene can be effectively made up; and the data inquiry method and the data inquiry device have good inquiry fault-tolerant capability.

Description

technical field [0001] The invention relates to the field of information technology, in particular to a data query method and device. Background technique [0002] The string fuzzy query problem, also known as the string similarity query problem, has always been a research hotspot in the field of data query and processing, and has been widely used in cross-research fields, such as: in databases and data warehouses, between tables through strings Similarity connection to complete data integration and cleaning; approximate pattern matching of DNA or protein sequences in bioinformatics; search engine's "do you mean" prompt function for user input errors; spell checking and error correction of application software, etc. [0003] String similarity is measured by the calculation result of "similarity function" or "distance function". Commonly used similarity functions are: Overlap similarity, Jaccard similarity, Cosine similarity, Dice similarity, etc. Similarity functions are o...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 崔甲孟丹王伟平陈重韬
Owner INST OF INFORMATION ENG CHINESE ACAD OF SCI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products