Index generation method and index generation device based on MapReduce programming architecture

A technology for generating devices and indexes, which is applied in the field of Internet information, and can solve problems such as reduced efficiency, reduced reading and writing efficiency, and difficulty in cluster expansion
CN102426609AActive Publication Date: 2012-04-25XIAMEN MEIYA PICO INFORMATION

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
XIAMEN MEIYA PICO INFORMATION
Publication Date
2012-04-25

Smart Images

  • Figure 1
    Figure 1
  • Figure 2
    Figure 2
  • Figure 3
    Figure 3
Patent Text Reader

Abstract

The invention relates to an index generation method and an index generation device based on a MapReduce programming architecture. The index generation method comprises the following steps of: acquiring data, preparing the data into a unified format and storing the prepared data in a record set formula; carrying out head encapsulation on each data record in the record set; inserting the data records subjected to data encapsulation into an HBase cluster in batch; calling a MapReduce service and an HBase service in an Hadoop cluster and connecting an Solr cluster; carrying out MapReduce operation and submitting an operation index parallel generating task to form a reverse index intermediate file; carrying out Reduce operation to generate a reverse index file; and starting a new Map task for carrying out slit operation on the reverse index file to generate a final index. According to the index generation method and the index generation device, disclosed by the invention, the storage of high-efficiency distributed mass data and the establishment of the index can be realized; and in addition, the index generation method and the index generation device have the advantages of extensibility, high fault tolerance, high performance and the like.
Need to check novelty before this filing date? Find Prior Art

Description

technical field

[0001] The present invention relates to the field of Internet information technology, and more specifically, to an index generation method and device based on a MapReduce programming architecture. Background technique

[0002] Traditional index engines (such as: lucene and lucene-based Solr) create and manage indexes in a file-based way, which has many disadvantages:

[0003] 1. It is not suitable for storage in the cluster file system, because the index files are still large batches of small files, and placing them on the cluster file system will greatly reduce the read and write efficiency;

[0004] 2. It is not easy to perform cluster expansion. When a large amount of data is indexed at the same time, due to the frequent creation and merging of new fragments, it is easy to increase the response time of the index engine and reduce the efficiency.

[0005] Based on the above analysis, it can be found that it is imperative to introduce an external management...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More