Supercharge Your Innovation With Domain-Expert AI Agents!

A clustering method based on big data

A clustering method and big data technology, applied in the field of clustering analysis, can solve the problems of reducing the speed of search efficiency, affecting the efficiency of retrieving user target information, etc., to achieve the effect of improving accuracy and effectiveness

Active Publication Date: 2021-08-10
成都东方盛行电子有限责任公司
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The effect of text clustering will greatly affect the efficiency of retrieving user target information. For example, compared with the method of sequentially organizing documents, the method of random clustering of documents will not improve the search efficiency but reduce the speed

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A clustering method based on big data
  • A clustering method based on big data
  • A clustering method based on big data

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0028] In order to have a clearer understanding of the technical features, purposes and effects of the present invention, the specific implementation manners of the present invention will now be described with reference to the accompanying drawings.

[0029] Such as figure 1 As shown, a clustering method based on big data includes the following steps:

[0030] S1. Segment news D to obtain news S;

[0031] S2. Determine whether the news S is the first news, if so, execute S5, if not, execute S3;

[0032] S3. Establish a VSM vector model for the news S, and calculate the similarity between the news S and all categories of the cluster center;

[0033] S4. find out the category C with the maximum similarity with the news S, if the similarity between the news S and the category C is greater than a preset threshold, then classify the news S into the category C, If it is less than the preset threshold, execute S5;

[0034] S5. Create a new category based on the news S;

[0035] ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a clustering method based on big data, comprising the following steps: performing word segmentation on news D to obtain news S; judging whether news S is the first news, if so, establishing a new category based on news S, if not, Establish a VSM vector model for the news S, calculate the similarity between the news S and all categories of the cluster center; find out the category C with the largest similarity with the news S, if the similarity between the news S and the category C is greater than the preset threshold, then the Classify news S into category C, if it is less than the preset threshold, create a new category based on news S; calculate the average similarity M1 between news S and other news in category C, and calculate other news and clustering centers in category C The average similarity M2 of other news, if M1 is greater than M2, update the news S as the new clustering center, otherwise the clustering center remains unchanged; judge whether the current news has been processed, if so, calculate the popularity of the news through the preset algorithm, and extract the hot spots News, otherwise continue to the next article.

Description

technical field [0001] The invention relates to the technical field of cluster analysis, in particular to a clustering method based on big data. Background technique [0002] Due to the rapid development of the Internet on a global scale and the rapid development of information technology, the various data used by people are growing at an explosive rate. A large amount of data is stored in the database, which can be applied to government offices, business intelligence, scientific research and project development, etc., but it is not easy to use these data. Understanding the massive data in the database is no longer within the scope of human ability. If we do not rely on automatic analysis methods, the large amount of data stored in the data will become a "data grave" - ​​a data archive that is difficult to access again. Because decision makers cannot manually excavate useful knowledge from massive data, the important decisions they make are not based on the data in the data...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/35G06F40/289G06K9/62
CPCG06F16/35G06F16/355
Inventor 马萧萧温大川吴春才冯良怀文斌杨树海姚晴麟
Owner 成都东方盛行电子有限责任公司
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More