Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Method and device of determining webpage clustering mode

A clustering and webpage technology, applied in the Internet field, can solve the problems of webpages not being able to cluster to the same pattern, poor clustering effect, and low efficiency, and achieve the effect of optimizing the clustering effect and improving the clustering efficiency

Inactive Publication Date: 2014-05-28
BEIJING QIHOO TECH CO LTD +1
View PDF3 Cites 9 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] However, in the above-mentioned existing prefix pattern clustering technology, because the prefix pattern clustering is relatively fine, some similar webpages cannot be clustered into the same pattern
If there are two user pages http: / / 360.cn / xxx / album / and http: / / 360.cn / yyy / album / , where "xxx" and "yyy" are the identities registered by the two users respectively (Identity; ID), then the prefix clustering technology can only be clustered to http: / / 360.cn / xxx / and http: / / 360.cn / yyy / , but in fact these two user pages They are similar pages (for example, they are all user's photo albums), and cannot be clustered into the same pattern. Therefore, the existing web page clustering technology has poor clustering effect and low efficiency

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device of determining webpage clustering mode
  • Method and device of determining webpage clustering mode
  • Method and device of determining webpage clustering mode

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0039] The preferred embodiments of the present invention will be described below in conjunction with the accompanying drawings. It should be understood that the preferred embodiments described here are only used to illustrate and explain the present invention, and are not intended to limit the present invention.

[0040] figure 1 It is a flowchart of a method for determining a web page clustering mode provided by an embodiment of the present invention. The method for determining the clustering mode of webpages in this embodiment is applied in the scenario where R&D personnel analyze the clustering of webpages, and the specific execution subject may be a device for determining the clustering mode of webpages, and the device for determining the clustering mode of webpages may specifically be An entity employing software integration. Such as figure 1 As shown, the method for determining the web page clustering mode in this embodiment may specifically include the following step...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a method and a device of determining a webpage clustering mode. The method comprises the steps of obtaining a main prefix of a uniform resource locator of a webpage to be clustered; segmenting the main prefix to obtain a plurality of fields; segmenting in a matching way to obtain a plurality of fields according to at least one preset reserve field in a reserve field dictionary and the position information of each preset reserve field, taking the field parts which are matched and identical with the preset reserve field and are provided with corresponding positions as the reserve fields, and generating the clustering mode of the uniform resource locator of the webpage to be clustered according to the reserve fields of a plurality of fields and the position information of the reserve fields. The invention also discloses a device used for realizing the method. According to the technical scheme, more webpages can be clustered under the clustering mode, the clustering effect of the webpages can be effectively optimized, and the clustering efficiency of the webpages is improved.

Description

technical field [0001] The invention relates to the technical field of the Internet, in particular to a method and a device for determining a web page clustering mode. Background technique [0002] In the data mining of web pages, in order to effectively mine and analyze the data of web pages, clustering technology is usually used to gather together the Uniform Resource Locator (URL) that is similar in appearance or function to form a collection. , the collection can be represented by a pattern pattern. [0003] In a web page clustering method commonly used in the prior art, URLs are segmented according to a certain delimiter to form a pattern. For example, the following URL "http: / / www.360.cn / weishi / index.html" can be divided into http: / / www.360.cn / and http: / / www.360.cn according to the separator " / ". / weishi / two prefix patterns, commonly used separators can include " / ", "?", "-" or "_" and so on. This pattern can be called prefix pattern clustering technology. The fe...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/9566
Inventor 王智广
Owner BEIJING QIHOO TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products