Massive web log data query and analysis method

A technology of data query and analysis method, which is applied in the direction of network data retrieval, network data indexing, electronic digital data processing, etc. It can solve the problems of inaccurate data analysis results and large retrieval time delay, achieve accurate results, realize data mining, Achieving Scalability and Efficiency

Inactive Publication Date: 2015-01-21
北京智融时代信息技术有限公司
View PDF1 Cites 34 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Existing big data query methods can only directly search row keys through HBase and retrieve them with the help of Hive's HQL. The retrieval delay is very large, and the data analysis results are not accurate, which cannot meet the current needs.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Massive web log data query and analysis method
  • Massive web log data query and analysis method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0024] The technical solutions provided by the present invention will be described in detail below in conjunction with specific examples. It should be understood that the following specific embodiments are only used to illustrate the present invention and are not intended to limit the scope of the present invention.

[0025] Customers will leave traces of their visits in the process of browsing the website, and these traces will be saved in the form of web log files. For these data, this example uses the ETL language in Hive, optimized Hive SQL query, MapReduce with combiner function, and genetic algorithm based on data segmentation technology to accurately provide log data query and analysis results. Such as figure 1 As shown, the specific steps of this method are as follows:

[0026] Step 10, use ETL in Hive to analyze the data of each data source. The ETL process includes four steps of data extraction, cleaning, transformation and loading. In the extraction stage, the so...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a massive web log data query and analysis method based on Hadoop and Hive by means of high reliability, high expansibility, high efficiency and high fault tolerance of a Hadoop and Hive distributed computing platform. The method includes the following steps that data of each data source are analyzed; the data are loaded into a database; HiveQL sentences are received; the received sentences are optimized to obtain a primary map result; the received sentences are converted into a Map Reduce task, the task is executed, and a query result is stored; the data are segmented; the data are analyzed and dug; the data are loaded into a Mysql database. According to massive web log data, precise query and data analysis are achieved, expansibility and effectiveness of storage, query and analysis of the massive data are achieved, and the problem that due to uneven job distribution caused by data skew, overall performance is reduced is avoided.

Description

technical field [0001] The invention belongs to the technical field of computer information processing, and in particular relates to a massive web log data query and analysis method based on Hadoop and Hive. Background technique [0002] With the rapid development of Internet technology, various applications and services running on the Internet have also emerged in large numbers, and the era of big data has come. Each website itself is an independent information system. After these websites are interconnected through the network, the entire Internet has become a huge information system. Customers will leave traces of their visits in the process of browsing the website, and these traces will be saved in the form of web log files. The logs of various systems, programs, operation and maintenance, transactions, etc. are becoming more and more important, because it is an important basis for system recovery, error tracking, security detection and other operations. [0003] D...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/951
Inventor 马廷淮瞿晶晶田伟薛羽曹杰
Owner 北京智融时代信息技术有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products