Web page content statistical method and system based on distributed file storage

A technology of distributed files and website pages, applied in the field of website page content statistics based on distributed file storage

Inactive Publication Date: 2014-04-16
BEIJING BEWINNER COMM CO LTD
View PDF6 Cites 13 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] The technical problem to be solved in this application is to provide a method and system for statistics of website page content based on dist

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Web page content statistical method and system based on distributed file storage
  • Web page content statistical method and system based on distributed file storage
  • Web page content statistical method and system based on distributed file storage

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0034] Such as figure 1 As shown, it is the flow of a method for counting website page content based on distributed file storage described in Embodiment 1 of the present application. The method includes:

[0035] Step 101, collecting access logs, storing the access logs in a distributed file storage database, and storing them in corresponding collection documents in the distributed file storage database in units of hours;

[0036]Step 102, decomposing the access log stored in the distributed file storage database and finding valid log information in the access log, and establishing a mapping relationship between the valid log information and the access log;

[0037] Step 103: Perform a simplification operation on all valid log information under the access log, output the simplified valid log information, and output clicks under the access log with access as the primary key according to the simplified valid log information The volume and the value of the corresponding traffic ...

Embodiment 2

[0070] Embodiment 2 of the present invention is a system for counting website page content based on distributed file storage, which is characterized in that it includes: a collection and access log module 201, a search and analysis module 202, and a simplification and statistics module 203; wherein,

[0071] The collecting access log module 201 is coupled with the distributed file storage database, and is used to collect the access log, and store the access log in the distributed file storage database, and store the access log in the distributed file in units of hours stored in the corresponding collection document in the database;

[0072] The search and analysis module 202 is coupled with the distributed file storage database and the simplified statistics module 203, and is used to decompose the access log stored in the distributed file storage database and find out the effective log information, and establish a mapping relationship between the effective log information and ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a web page content statistical method and a web page content statistical system based on distributed file storage. The method comprises the following steps: acquiring an access log, storing the access log into a distributed file storage database, and storing the access log into a corresponding collection document of the distributed file storage database by taking an hour as a unit; analyzing the access log stored in the distributed file storage database, searching effective log information in the access log, and establishing a mapping relationship between the effective log information and the access log; performing simplified operation on the effective log information of the access log, outputting the simplified effective log information, and according to the simplified effective log information, outputting the click volume taking access under the access log as a main key and the corresponding traffic intermediate data value or outputting an independent visitor number taking access and session identity under the access log as a main key and the corresponding traffic intermediate data value. Through the method and the system, the problems of PV (page view) and UV (unique visitor) statistics of a web site can be performed by conveniently combining query conditions according to service types are solved.

Description

technical field [0001] This application relates to the field of website data statistics, in particular, to a method and system for website page content statistics based on distributed file storage. Background technique [0002] In the prior art, MongoDB (distributed file storage database, a product between relational database and non-relational database) supports a very loose data structure, which is in the bson format similar to json, so it can store more complex data types. The biggest feature of Mongo is that the supported query language is very powerful. Its syntax is similar to that of an object-oriented query language. It can almost realize most of the functions similar to the single-table query of a relational database, and it also supports indexing of data. MongoDB aims to provide scalable high-performance data storage solutions for WEB applications. [0003] At present, websites based on files and databases need to perform UV (independent visitor, Unique Vistor, a ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30G06F11/34
CPCG06F11/3476G06F16/182
Inventor 瞿继合赵哲曹东李建涛
Owner BEIJING BEWINNER COMM CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products