Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Webpage clustering processing method based on improved K-means algorithm

A technology of k-means algorithm and processing method, which is applied in the fields of electrical digital data processing, natural language data processing, special data processing applications, etc., can solve problems such as the optimal solution of multiple factors, shorten calculation time, and ensure reliability. And the effect of precision and improving efficiency

Pending Publication Date: 2020-07-10
CHONGQING UNIV OF POSTS & TELECOMM +1
View PDF1 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

As far as the present stage is concerned, this algorithm has derived multiple classifications, such as density, model, etc. Although the algorithm has achieved large-scale popularization and active use in many fields by virtue of its own strong advantages, its optimal solution is subject to multiple factors

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Webpage clustering processing method based on improved K-means algorithm
  • Webpage clustering processing method based on improved K-means algorithm
  • Webpage clustering processing method based on improved K-means algorithm

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0033] The technical solutions in the embodiments of the present invention will be described clearly and in detail below with reference to the drawings in the embodiments of the present invention. The described embodiments are only some of the embodiments of the invention.

[0034] The technical scheme that the present invention solves the problems of the technologies described above is:

[0035] In this embodiment, an improved web page clustering processing method based on the K-means algorithm is performed as follows.

[0036] Step 1: Collect webpage text dataset

[0037] Collect the website text data set, and grab the text information that needs to be obtained by using the web crawler tool developed based on the Python language.

[0038] Step 2: Perform dataset preprocessing on the collected data

[0039] Perform data set preprocessing on the collected data, use the Chinese text word segmentation tool to perform word segmentation processing on the acquired text informati...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a webpage clustering processing method based on an improved K-means algorithm. The webpage clustering processing method based on the improved K-means algorithm is a webpage clustering processing method based on the improved K-means algorithm. According to the method, an initial clustering center selection algorithm and a K-means clustering algorithm are mainly combined to replace a traditional K-means algorithm to cluster webpage information. According to the algorithm, the reliability and authenticity of a clustering result are ensured from the source. And the method is applied to webpage big data processing. It is further found through experimental result analysis that compared with a traditional algorithm, the advantages of the improved algorithm is not only reflected by the accuracy of the clustering result, but also reflected by the operation efficiency, and the improved algorithm is completely suitable for webpage clustering analysis, optimizes the clustering effect of webpages and improves the stability of the algorithm.

Description

technical field [0001] The invention belongs to the field of data mining, in particular to a processing method based on K-means algorithm improvement applied to web page clustering. Background technique [0002] Clustering analysis technology is one of the important basic research topics in the field of current data mining technology. Our so-called clustering refers to putting data objects with high similarity into In a cluster, it is a clustering analysis technique that puts other data objects with high dissimilarity in different clusters as much as possible. The K-means algorithm is the most representative machine learning algorithm. A large number of practical studies have clearly confirmed that this algorithm has multiple advantages, such as simple operation and reliable results. As far as the present stage is concerned, this algorithm has derived multiple classifications, such as density, model, etc. Although the algorithm has achieved large-scale popularization and ac...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/33G06F16/35G06F16/951G06F40/289G06K9/62
CPCG06F16/3344G06F16/35G06F16/951G06F18/23213
Inventor 李校林谭航
Owner CHONGQING UNIV OF POSTS & TELECOMM
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products