System and method for rapidly searching unstructured data

An unstructured data and fast technology, applied in the information field, can solve the problems of high system resource consumption and low performance, and achieve the effects of low consumption, fast speed, and low reading and writing pressure

Inactive Publication Date: 2013-12-11
南京烽火星空通信发展有限公司
View PDF7 Cites 13 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Although text data can be retrieved through the reverse index, when the user searches with multiple search conditions, or the search conditions contain multiple search terms, a large number of text identification comparisons are required, which consumes a lot of system resources. The consumption is relatively large, and the performance is low

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • System and method for rapidly searching unstructured data
  • System and method for rapidly searching unstructured data

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0024] A system for fast retrieval of unstructured data described in this patent application, its module distribution is as attached figure 1 As shown, it includes a query processing module, a query condition parsing module, a big data storage module, a Bloom filter filtering module, an index building module and a Bloom filter building module.

[0025] The big data storage module includes a reverse index table and a data record table. The data update steps of the inverted index table include:

[0026] 1. The index building module receives all the word segmentation conditions contained in each data record, and uses the reverse index construction method to generate corresponding relationship data from word segmentation to record identification;

[0027] 2. Send the corresponding relationship data to the Bloom filter building module to generate a Bloom filter; the length of the Bloom filter is proportional to the number of data records containing the word; use each record identi...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a system and a method for rapidly searching unstructured data. The system comprises a query processing module, a query condition analyzing module, a large data storage module, a Bloom filter filtering module, an index establishment module and a Bloom filter building module, wherein the query processing module is used for receiving the search request of an application client, acquiring the query conditions from the search request, calling the query condition analyzing module to analyze and transform the query conditions, using a Bloom filter to replace the word segmentation condition in the original query conditions and generating a new query condition based on Bloom filter filtering record identification; the Bloom filter and other query conditions are tested by a condition query data record sheet; the data record which is in accordance with all the query conditions is taken as the search result; the search result is returned to the application client. The system and the method can more quickly judge whether a data record is in accordance with the word segmentation condition or not, and consumes less resource.

Description

technical field [0001] The application belongs to the field of information technology, and in particular relates to a system and method for fast retrieval of unstructured data in large data volume storage. Background technique [0002] With the rapid development of the Internet, applications such as webpages, blogs, social networks, and instant messaging software have become popular rapidly, and a large amount of unstructured text data has been generated. How to quickly and effectively retrieve these text data has become a research hotspot in the industry. Internet search engines establish reverse indexes from word segmentation to text identification through word segmentation operations on text data, and through these indexes, text retrieval based on word segmentation is realized. Although text data can be retrieved through the reverse index, when the user searches with multiple search conditions, or the search conditions contain multiple search terms, a large number of text...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
Inventor 周帅锋赵智峰曹俊亮穆宁
Owner 南京烽火星空通信发展有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products