Clustering method for unsupervised learning of Chinese comments, computer program product and server system

A technology of unsupervised learning and clustering method, applied in the field of data mining and processing, it can solve problems such as difficult to reflect the real situation, too much noise, new comments or keywords that have no effect.

Inactive Publication Date: 2019-06-11
小视科技(江苏)股份有限公司
View PDF7 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

There are two main ways to generate these tags. One of them is extraction, which is to extract the words or phrases with the highest frequency based on statistical principles, form tags, and arrange them in order of frequency. This method is used when tagging It will generate a lot of noise, and the extraction based only on statistical principles often gets strange results (labels), which cannot truly reflect the characteristics of reviews or products; the other is based on the generation of pre-customized labels, and then comment information If it occurs once, it will be accumulated by 1. After querying all the comments, you will get the cumulative result of the custom label. Take the first N and arrange it to get the final labeling result. This method often requires comparative labor when labeling. Low efficiency, and can only be accumulated for custom tags, often ineffective for new comments or keywords
[0003] Combining the above two methods, both are based on supervised clustering, which is characterized by the fact that it is difficult to reflect the real situation

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Clustering method for unsupervised learning of Chinese comments, computer program product and server system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0037] In order to better understand the technical content of the present invention, specific embodiments are given together with the attached drawings for description as follows.

[0038] Aspects of the invention are described in this disclosure with reference to the accompanying drawings, which show a number of illustrated embodiments. Embodiments of the present disclosure are not necessarily intended to include all aspects of the invention. It should be appreciated that the various concepts and embodiments introduced above, as well as those concepts and implementations described in more detail below, can be implemented in any of numerous ways.

[0039] combine figure 1 , according to the clustering method of unsupervised learning of Chinese reviews according to the disclosed embodiments of the present invention, it aims to cluster the reviews of the obtained company's products or services, and obtain the TOPN review tags that are most able to extract the review results, fo...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a clustering method for unsupervised learning of Chinese comments, a computer program product and a server system, and the clustering method comprises the steps: obtaining comment data, and sorting to obtain a corpus; preprocessing the comment content information in the corpus, and carrying out word segmentation and word vector training; extracting candidate tags; performingduplicate removal processing on the candidate tag library; carrying out sentiment word filtering on the candidate tags subjected to duplicate removal; carrying out DBSCAN-based clustering operation on the candidate tags without the invalid tags to obtain the magnitude of all the candidate tags, and carrying out descending order arrangement on clustering results according to the number; and finally, counting each clustering magnitude, and outputting TopN. According to the method, the clustering mode based on unsupervised learning is provided, the problem that a comment result is difficult to objectively express through an existing label clustering method is solved, autonomous and unsupervised extraction and learning can be carried out according to comments and the actual content of labels,and the clustering result which reflects the real comment result more objectively is provided.

Description

technical field [0001] The invention relates to the technical field of data mining and processing, in particular to a clustering method for unsupervised learning of Chinese comments, a computer program product and a server system. Background technique [0002] At present, in the evaluation of goods or services on e-commerce platforms or forums, tags are often extracted and displayed through technical means, so that potential users can directly obtain the most direct evaluation of products or services. There are two main ways to generate these tags. One of them is extraction, which is to extract the words or phrases with the highest frequency based on statistical principles, form tags, and arrange them in order of frequency. This method is used when tagging It will generate a lot of noise, and the extraction based only on statistical principles often gets strange results (labels), which cannot truly reflect the characteristics of reviews or products; the other is based on the...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/35
Inventor 杨帆于巨明尚应
Owner 小视科技(江苏)股份有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products