Fully-automatic grouping method of WEB pages based on title separator

A delimiter, fully automatic technology, applied in special data processing applications, instruments, electrical digital data processing and other directions, can solve problems such as page grouping analysis cannot be solved well

Inactive Publication Date: 2009-11-11
北京黑米天成科技有限公司
View PDF2 Cites 9 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] None of the WEB analysis tools currently on the market can solve this problem well in terms of page grouping analysis

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Fully-automatic grouping method of WEB pages based on title separator
  • Fully-automatic grouping method of WEB pages based on title separator
  • Fully-automatic grouping method of WEB pages based on title separator

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0012] The technical scheme that the present invention adopts is: gather WEB page title information by embedding javascript script in WEB page source code; Utilize full-text retrieval technology to identify the delimiter in the title, title is split into a plurality of keywords, take these keywords as Group name, automatically generate a tree structure, and automatically classify related pages into each group.

[0013] The specific workflow is as follows:

[0014] (1), at first javascript collection script is embedded in the source code of all WEB pages to be grouped;

[0015] (2) The javascript script can work automatically and collect every visit behavior log, including the visitor's source IP address, source URL, visit time, URL of the visited page, title of the visited page, stay time, Where to go when leaving... and store this information in a database;

[0016] (3), the system uses the full-text search technology to identify the delimiter in the title and divide the ti...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a fully-automatic grouping method of WEB pages based on title separator, which comprises the following steps: separators in titles of WEB pages are fully used to automatically generate pages grouping trees, and pages are automatically classified into each group. The method provided by the invention can be applied into the field of network WEB analysis, and the core concept comprises: (1) visited-page information including titles is obtained by WEB behavior acquisition technology; (2) separators in title characters are used for segmenting titles into a plenty of key phrases; (3) a tree-like hierarchical structure corresponding to each key phrase is setup, and the routes by way of root nodes, intermediate nodes and leaf nodes of the tree-like hierarchical structure can be obtained through hierarchical decomposition; (4) if the key phrase of the title hierarchical decomposition can not be found in corresponding hierarchical nodes of the tree-like structure (only limited in the route), the key phrase is taken as attribute to establish a new node according to the rule of the tree-like structure so as to realize the establishment of the pages sorted tree-like structure.

Description

technical field [0001] The invention relates to the related field of Internet WEB analysis. Background technique [0002] With the development of the Internet, WEB analysis has become an important means for companies to understand their own website operations. For many websites, the number of content pages displayed is very large. For example, an e-commerce website will have a product catalog, corresponding to multiple product category pages and product terminal pages, and these pages are classified, grouped, and divided. Category attribution is very necessary. How to automatically identify the category attribution of tens of millions of pages has become an important topic of WEB content analysis. [0003] None of the WEB analysis tools currently on the market can solve this problem well in terms of page grouping analysis. Contents of the invention [0004] In order to solve the above-mentioned existing problems, the present invention discloses a fully automatic WEB pag...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 王凯
Owner 北京黑米天成科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products