Method and device for establishing NoSQL database index for semi-structured data

A semi-structured data and database technology, applied in semi-structured data indexing, semi-structured data retrieval, digital data information retrieval, etc., can solve the problems of low system throughput, slow database index update speed, and long search time. , to achieve the effect of improving query efficiency

Active Publication Date: 2015-07-22
ALIBABA GRP HLDG LTD
View PDF5 Cites 36 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0009] However, the query efficiency of the existing inverted index based on NoSQL for data is very low
The reason is that when users query based on keywords, they need to search the corresponding keywords in the entire inverted index table, and the search time increases exponentially with the amount of data, resulting in low query efficiency
[0010] In addition, the update efficiency of existing NoSQL database indexes is too low
Taking the above-mentioned document retrieval system as an example, in the existing NoSQL system, when adding data of a new document, it is necessary to read the original inverted index table first, and find out the position of the keyword of the new document in the inverted index table. position, and then write the document ID of the new document into the inverted index table corresponding to these keywords; since the content of the inverted index table needs to be read first, the update speed of the database index is significantly reduced. When the database is large, The update rate will reach the point where it is unacceptable
[0011] To sum up, due to the way of building indexes based on NoSQL storage data in the existing technology, there are problems of low query efficiency and update efficiency. Therefore, the system throughput is low and cannot handle the writing and query of TB-scale documents

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for establishing NoSQL database index for semi-structured data
  • Method and device for establishing NoSQL database index for semi-structured data
  • Method and device for establishing NoSQL database index for semi-structured data

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0038] The following will clearly and completely describe the technical solutions in the embodiments of the application with reference to the drawings in the embodiments of the application. Apparently, the described embodiments are only some of the embodiments of the application, not all of them. All other embodiments obtained by persons of ordinary skill in the art based on the embodiments in this application belong to the protection scope of this application.

[0039] see figure 1 , which is a flow chart of the first embodiment of a method for constructing an index based on NoSQL for semi-structured data in the present application. The application environment of this embodiment is to build a NoSQL database index for logs of a website with multiple servers. For server logs, the structural clue is when the log was generated.

[0040] In this embodiment, the following steps are included:

[0041] Step S110: Preprocessing the log texts generated by each server including the g...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

Semi-structured source data is preprocessed to obtain text partitions to be stored into a data table with a first combined primary key including a structure thread primary key and a sequence value primary key. The structure thread primary key identifies a structure thread that is segmented into several consecutive intervals according to a determined or predetermined sequence. An inverted index table, created for the preprocessed text partitions, includes a second combined primary key including the structure thread primary key and a keyword primary key. Corresponding to values of the primary keys in the second combined primary key, related text partition sequence IDs are recorded as index values of the inverted index table. Index values having a same keyword primary key value but different structure thread primary key values are located in different rows in the inverted index table. The present techniques improve query efficiency of database index and facilitate updating.

Description

technical field [0001] The present application relates to the field of computer application technology, in particular to a method and device for constructing NoSQL database indexes for semi-structured data. Background technique [0002] A database management system (database management system) is a way of manipulating and managing databases, used to establish, use and maintain databases. It manages and controls the database in a unified way to ensure the security and integrity of the database. [0003] With the advent of the big data era, transaction and interaction data are also increasing. Among them, the processing of TB data level has become the basic configuration; and the data type has also changed from singleness to diversity, such as: structured data, unstructured data, semi-structured data, etc., where structured data usually refers to Data information such as enterprise ERP and financial systems; unstructured data refers to voice, image, video and other data; sem...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F17/30911G06F17/30339G06F16/81G06F16/2282G06F16/319
Inventor 周琦孙廷韬蔡华林豪
Owner ALIBABA GRP HLDG LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products