Check patentability & draft patents in minutes with Patsnap Eureka AI!

Large-index quick splitting method based on Lucene

A fast technology for index sharding, applied in database indexing, structured data retrieval, special data processing applications, etc., can solve problems such as long sharding process, high overhead, and long splitting process, so as to reduce system IO pressure and eliminate Unpredictability, the effect of fast index splitting process

Pending Publication Date: 2020-01-21
南京录信软件技术有限公司
View PDF3 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, this process will be very time-consuming when the amount of data is too large, and if the original data is modified during the splitting process, it may cause data loss. Certain measures are required to ensure data security and novelty
[0005] Disadvantages of the existing technology: 1. The existing index splitting technology is to directly copy the index data to the new shard. When the amount of data is large, because of an extra copy, the Expensive and unnecessary
2. After setting the new number of shards, it may be necessary to rearrange all the data. If the amount of data is large, it will be very time-consuming
3. The existing index splitting technology needs to use a certain algorithm for index fragmentation to complete the positioning. The number of fragments is part of the algorithm, and the cost of modifying the number of fragments is very expensive
4. The growth of project data is unpredictable, and it is difficult to set the exact number of fragments
5. If sharding is performed during use, the sharding process will be very long. If the original data is modified during the splitting process, these modifications may be lost
6. If the shard is locked before the split and cannot be modified again, it cannot be modified until the split is completed, which will cause a large number of abnormal requests to the calling server because the split process is too long

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Large-index quick splitting method based on Lucene
  • Large-index quick splitting method based on Lucene

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0022] The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some of the embodiments of the present invention, not all of them. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

[0023] In the description of the present invention, it should be noted that the orientation or positional relationship indicated by the terms "vertical", "upper", "lower", "horizontal" etc. is based on the orientation or positional relationship shown in the drawings, and is only In order to facilitate the description of the present invention and simplify the description, it does not indicate or imply that the device or element referred to must hav...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a large-index quick splitting method based on Lucene, which comprises the following steps of: marking on a current newly-built index fragment directory by utilizing a soft connection principle of a file under Linux, and marking an original index file storage position pointed by a current file; deleting half of specified index data on the current newly-built index fragment and deleting the other half of opposite data on the other index fragment by means of the delta form characteristic of Lucene, so as to finish the process of splitting the index file into two parts fromone part; after index splitting is completed, determining a storage directory located by the current index data according to a deletion condition used during deletion operation, and re-storing subsequent data; according to the method provided by the invention, no extra copy overhead is needed in the splitting process, the efficiency of deleting the specified index data is high, and the index splitting process is accelerated; after the index splitting process is completed, the subsequent data storage index data positioning rule is related to the deletion condition of the deletion operation, noadditional algorithm is needed, and simplicity, convenience and quickness are achieved.

Description

technical field [0001] The invention relates to the technical field of file indexing, in particular to a method for quickly splitting large indexes based on Lucene. Background technique [0002] With the advent of the big data era, the amount of data is growing explosively. After the data is indexed at the time of storage, the retrieval performance of the data is greatly improved. Unfortunately, there is also a price to pay for indexing tables. First of all, the establishment of an index needs to occupy physical space. When there is more and more data, the index file will become larger and larger; secondly, it will take time to create and maintain the index, and this time will increase with the increase of the amount of data; When the data is added, deleted, and modified, the index must also be dynamically maintained. The more data, the larger the index file, and the lower the data maintenance efficiency. [0003] If there is only one index file or the number of shards in...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/22
CPCG06F16/22G06F16/2272
Inventor 王帅
Owner 南京录信软件技术有限公司
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More