Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Method and device for clustering high-frequency keywords in webpages

A keyword and keyword combination technology, applied in the Internet field, can solve the problems of large amount of information, words cannot reflect the main content of the document, and the full text consumes a lot of work, so as to simplify the collection, facilitate reading, and save time.

Active Publication Date: 2013-08-21
北界无限(北京)软件有限公司
View PDF7 Cites 38 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] In order to effectively obtain different types of information, the existing technology will cluster multiple web documents. However, the clustering method in the prior art is based on the full text of web documents. Due to the large amount of information in the full text of web documents, the Clustering requires a lot of work; at the same time, the full text involves a lot of content, and some words do not reflect the main content of the document, and these words will affect the accuracy of document clustering
Therefore, clustering web documents through the full text cannot meet the clustering requirements for information

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for clustering high-frequency keywords in webpages
  • Method and device for clustering high-frequency keywords in webpages
  • Method and device for clustering high-frequency keywords in webpages

Examples

Experimental program
Comparison scheme
Effect test

example 2

[0076] Example 2, the following formula (1) can be used as the default matching condition:

[0077] Σ x = n - m - 1 n - 1 S ( x ) > Σ x = n - m n S ( x ) - - - ( 1 )

[0078] Among them, n is the current generation, m is the specified threshold, and S(x) is the individual fitness of the best individual in the xth generation. That is, when the sum of the fitness of the optimal individuals from the n-m-1th generation to the n-1th generation in total m ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a method and a device for clustering high-frequency keywords in webpages and relates to the field of internet. The method includes: capturing a plurality of webpage documents corresponding to a plurality of webpages; segmenting words of each webpage document captured so as to acquire multiple terms; determining keyword combinations corresponding to the webpage documents; acquiring high-frequency keywords from the keyword combinations and clustering the high-frequency keywords so as to acquire the high-frequency keywords of the same kind according to similarity, wherein the keyword combinations include keywords indicating content of the corresponding webpage documents, and the high-frequency keywords in the keyword combinations are keywords meeting preset conditions within a preset time period. By clustering, webpage documents with relevance are classified into the same kind, and accordingly, users can more conveniently read the webpage documents of the same kind, information search of users is simplified and users' time is saved.

Description

technical field [0001] The present invention relates to the field of the Internet, in particular to a method and device for clustering high-frequency keywords in webpages. Background technique [0002] How to discover the most valuable information is an unsolved problem under the circumstance of the rapid increase of Internet information. Because information will be released through multiple channels and forms, and even the same information may have different descriptions, which will bring certain obstacles for readers to accurately obtain certain types of information. [0003] In order to effectively obtain different types of information, the existing technology will cluster multiple web documents. However, the clustering method in the prior art is based on the full text of web documents. Due to the large amount of information in the full text of web documents, the Clustering requires a lot of work. At the same time, the full text involves a lot of content, and some words ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30G06N3/12
Inventor 李学科
Owner 北界无限(北京)软件有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products