Distributed type reverse index organization method based on user log analysis

An inverted index and user log technology, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve the problems of not considering the difference in query frequency of "words", simple design, and unsatisfactory overall effect. Achieve the effects of avoiding computing overhead, balancing query response speed, and ensuring load

Inactive Publication Date: 2012-10-10
ZHEJIANG UNIV
View PDF2 Cites 29 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The existing "hybrid segmentation" methods are generally simple in design, without considering the difference in query frequency between "words", and the overall effect is not ideal

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Distributed type reverse index organization method based on user log analysis
  • Distributed type reverse index organization method based on user log analysis
  • Distributed type reverse index organization method based on user log analysis

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0052] Such as figure 1 As shown, the overall system architecture of this embodiment consists of two parts: index establishment and query routing. The query log processing module is responsible for analyzing query logs, proposing high-frequency words, and performing clustering according to established parameters, and then clustering according to the objective function Classes are assigned to each node in the index cluster, and each node builds an index; query processing module: responsible for receiving query front-end requests, updating query logs, and selecting appropriate nodes for querying according to the global index and the current load of each node. The implementation steps of the distributed inverted index organization method based on user log analysis in this embodiment are as follows:

[0053] 1) Analyze user query logs and extract high-frequency words and non-high-frequency words, establish a correlation matrix of high-frequency words, and establish a high-frequenc...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a distributed type reverse index organization method based on user log analysis. The distributed type reverse index organization method comprises the following steps: 1) analyzing query logs of the user, extracting high-frequency words and non-high-frequency words, establishing a relativity matrix of the high-frequency words, and establishing a high-frequency word relation graph according to the relativity of the high-frequency words; 2) calculating the load of each high-frequency word, and clustering the high-frequency words according to the high-frequency word relation graph and the loads of the high-frequency words; 3) distributing the clusters to nodes, establishing a high-frequency word index, hashing non-high-frequency words to the nodes, and establishing a non-high-frequency word index; 4) establishing a global index table according to the high-frequency word index and the non-high-frequency word index, and inquiring routes according to the global index table. The distributed type reverse index organization method disclosed by the invention has the advantages of small query cost, high query efficiency, and favorable query performance, and also has the advantages that the distributed type reverse index organization method can realize the balance of the throughput of the entire system and the query response speed of each time, and less nodes is referred during the query of a plurality of words.

Description

technical field [0001] The invention relates to the technical field of computer information retrieval, in particular to a distributed inverted index organization method based on user log analysis. Background technique [0002] With the continuous development of Internet technology, a large amount of information is generated every day in today's society, and this information is often presented in the form of unstructured data such as web pages, pictures, videos, and audios. Faced with such a vast amount of data, it is as difficult as finding a needle in a haystack if people want to obtain information that meets their needs. Therefore, in this era of massive information, in order to obtain useful information quickly and efficiently, we must rely on various Information Retrieval Systems (IRS). The main purpose of IRS is to provide people with effective information services. It is established according to specific information needs and realizes the programmed system of informat...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 陈岭李卓豪
Owner ZHEJIANG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products