Index generation method and index generation device based on MapReduce programming architecture

A technology for generating devices and indexes, which is applied in the field of Internet information, and can solve problems such as reduced efficiency, reduced reading and writing efficiency, and difficulty in cluster expansion

Active Publication Date: 2012-04-25
XIAMEN MEIYA PICO INFORMATION
View PDF5 Cites 54 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] 1. It is not suitable for storage in the cluster file system, because the index files are still large batches of small files, and placing them on the cluster file system will greatly reduce the read and write efficiency;
[0004] 2. It is not easy to perform cluster expansion. When a large amount of data is indexed at the same time, due to the frequent creation and merging of new fragments, it is easy to increase the response time of the index engine and reduce the efficiency
Although this process does not appear to be as efficient as serial computing, the MapReduce system can handle large amounts of data that cannot be handled by general servers

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Index generation method and index generation device based on MapReduce programming architecture
  • Index generation method and index generation device based on MapReduce programming architecture
  • Index generation method and index generation device based on MapReduce programming architecture

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0064] The flow of the index generation method S100 based on the MapReduce programming architecture of the preferred embodiment of the present invention is as follows figure 1 As shown, the method S100 may be implemented by hardware, software, or a combination of software and hardware. It starts at step S110. In step S120, acquire data, organize the data into a unified format, and store in the form of record collection; in step S130, carry out header encapsulation to each data record in the record collection; in step S140, parallel to The HBase cluster inserts the data records encapsulated by the head in batches; in step S150, call the MapReduce service and the HBase service in the Hadoop cluster, connect to the Solr cluster, and confirm the cluster status; in step S160, map the data records in the HBase cluster Operation, submit the running index parallel generation task to form the inverted index intermediate file; in step S170, perform the Reduce operation on the data reco...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to an index generation method and an index generation device based on a MapReduce programming architecture. The index generation method comprises the following steps of: acquiring data, preparing the data into a unified format and storing the prepared data in a record set formula; carrying out head encapsulation on each data record in the record set; inserting the data records subjected to data encapsulation into an HBase cluster in batch; calling a MapReduce service and an HBase service in an Hadoop cluster and connecting an Solr cluster; carrying out MapReduce operation and submitting an operation index parallel generating task to form a reverse index intermediate file; carrying out Reduce operation to generate a reverse index file; and starting a new Map task for carrying out slit operation on the reverse index file to generate a final index. According to the index generation method and the index generation device, disclosed by the invention, the storage of high-efficiency distributed mass data and the establishment of the index can be realized; and in addition, the index generation method and the index generation device have the advantages of extensibility, high fault tolerance, high performance and the like.

Description

technical field [0001] The present invention relates to the field of Internet information technology, and more specifically, to an index generation method and device based on a MapReduce programming architecture. Background technique [0002] Traditional index engines (such as: lucene and lucene-based Solr) create and manage indexes in a file-based way, which has many disadvantages: [0003] 1. It is not suitable for storage in the cluster file system, because the index files are still large batches of small files, and placing them on the cluster file system will greatly reduce the read and write efficiency; [0004] 2. It is not easy to perform cluster expansion. When a large amount of data is indexed at the same time, due to the frequent creation and merging of new fragments, it is easy to increase the response time of the index engine and reduce the efficiency. [0005] Based on the above analysis, it can be found that it is imperative to introduce an external management...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 兰轶伦汤伟宾章正道
Owner XIAMEN MEIYA PICO INFORMATION
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products