Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

File storage method based on hdfs storage and taking lucene as index

A technology for indexing files and file storage, which is applied in text database indexing, text database querying, and special data processing applications. It can solve problems such as long interaction delay, frequent operations, and high pressure on namenode, so as to reduce the number of reads and writes, Effect of reduced file count, improved performance

Pending Publication Date: 2021-01-15
南京好鱼科技有限公司
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The current system has the following problems: due to the large number of files, the LS (enumeration) operation takes a long time; the interaction between the Open (open) operation and the namenode has a long delay, which reduces the system query performance; Hdfs (distributed file system) has many files, and the namenode The pressure is extremely high, resulting in a decline in query performance; the index file is read many times, and the operation is too frequent

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0024] The present invention will be further described in detail below in conjunction with specific examples.

[0025] A file storage method based on hdfs storage and using lucene as an index, specifically as follows, step S1: first, build an index system and obtain each index file that is continuously updated during the maintenance process of the Lucene distributed index system; step S2: then, the Each index file obtained in the step S1 is screened and classified, and then the index files are merged, and the scattered small index files are merged into a single index file; Step S3: the types in the step S2 are tim, tip, doc, The files of dvd, dvm, fdx, pay and pos are all merged into one file; step S4: store the files in step S3 according to the order rules; step S5: perform data verification on the data stored in step S4, and merge after verification The final file can be read normally and finally stored, and the file storage process is completed so far.

[0026] The file da...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a file storage method based on hdfs storage and taking lucene as an index in the technical field of file indexing, which comprises the following steps of: firstly, establishing an index system and obtaining each index file generated by continuously updating in the maintenance process of a Lucene distributed index system; then, screening and classifying the obtained index files, then, carrying out index file merging processing, and merging the scattered small index files into a single index file; combining the files of which the types are tim, tip, doc, dvd, dvm, fdx, pay and pos into one file; storing the files according to a sequence rule; and performing data verification on the stored data, and performing final storage after the verified and combined file can benormally read, so as to finish the storage processing of the file. The invention can provide the file storage method based on hdfs storage and taking lucene as the index, which reduces the file read-write frequency, is efficient in processing and storage and shortens the operation delay.

Description

technical field [0001] The invention relates to the technical field of file indexing, in particular to a file storage method based on hdfs storage and using lucene as index. Background technique [0002] In recent years, with the continuous integration of technologies such as the Internet of Things, social networks, and cloud computing into our lives, and the rapid development of existing computing power, storage space, and network bandwidth, the data accumulated by humans has been widely used in the Internet, communications, finance, commerce, and medical care. The fields are constantly growing and accumulating. People not only hope to extract valuable information from big data, but also hope to discover deeper laws that can effectively support decision-making in production and life; however, how to obtain more valuable information from hundreds of millions of terabytes of data is a key issue of science and technology. The problems that workers have been thinking about and...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F16/31G06F16/33G06F16/182
CPCG06F16/182G06F16/316G06F16/334
Inventor 母延年
Owner 南京好鱼科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products