Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A text topic clustering algorithm based on natural language processing

A technology of natural language processing and topic clustering, applied in text database clustering/classification, electronic digital data processing, special data processing applications, etc., can solve problems such as economic losses, failure to consider, and mistakes in improvement ideas for smart home equipment operators , to achieve high accuracy

Active Publication Date: 2019-01-18
GUANGDONG UNIV OF TECH
View PDF7 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] (1) The accuracy of the current Chinese text word segmentation algorithm is not high;
[0004] (2) The accuracy of the current text topic model construction algorithm is not high;
[0005] (3) The current text topic clustering algorithm cannot well remove the influence of historical records on current decision-making, that is, it cannot slowly forget too old evaluation texts like humans, resulting in the excavated user concerns deviating from the user's latest The focus of attention, which in turn causes businessmen, such as smart home equipment operators, to make mistakes in improving ideas, causing serious economic losses
[0006] The method most similar to the present invention has Zhang Wanshan et al. (Zhang Wanshan, Xiao Yao, Liang Junjie, etc. Web text clustering method based on theme [J]. Computer Application, 2014,34(11):3140-3143.) in the above In the completed research, aiming at the problem that the traditional Web text clustering algorithm does not consider the topic information of Web text, resulting in low accuracy of multi-topic Web text clustering results, a topic-based Web text clustering method is proposed.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A text topic clustering algorithm based on natural language processing
  • A text topic clustering algorithm based on natural language processing
  • A text topic clustering algorithm based on natural language processing

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0051] The accompanying drawings are for illustrative purposes only and cannot be construed as limiting the patent;

[0052] For those skilled in the art, it is understandable that some well-known structures and descriptions thereof may be omitted in the drawings.

[0053] The technical solutions of the present invention will be further described below in conjunction with the accompanying drawings and embodiments.

[0054]A text topic clustering algorithm based on natural language processing, the process is as follows figure 1 shown, including the following steps:

[0055] S1. Get user comment text;

[0056] S2. Perform data preprocessing on the user comment text to obtain a user comment text corpus;

[0057] S3. Perform Chinese word segmentation on the user comment text corpus to obtain the user comment text lexical item library;

[0058] S4. Modeling the user comment text term library to obtain the topic model of the comment text;

[0059] S5. Using a text topic cluster...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a text topic clustering algorithm based on natural language processing. Firstly, a Chinese corpus is formed according to the invention. Secondly, data preprocessing is carriedout for the Chinese text in order to reduce the computational cost of the subsequent algorithm. Then, a novel Chinese word segmentation and feature vectorization algorithm is implemented to transformthe lexical items of comment text from lexical text space to vector space. Then, the dimension of the generated text lexical vector space is reduced to realize the transformation from the text lexicalvector space to the text theme space. Finally, the text topic clustering will be carried out according to the generated text topic model to get the attention of the commenting user to a certain commodity, and finally give some improvement directions for a certain commodity, in order to make the product more and more close to the needs of most users, such as the quality or price needs to be improved and so on.

Description

technical field [0001] The present invention relates to the field of natural language processing, more specifically, to a text topic clustering algorithm based on natural language processing. Background technique [0002] At present, there are mainly the following shortcomings in the traditional text clustering algorithm: [0003] (1) The accuracy of the current Chinese text word segmentation algorithm is not high; [0004] (2) The accuracy of the current text topic model construction algorithm is not high; [0005] (3) The current text topic clustering algorithm cannot well remove the influence of historical records on current decision-making, that is, it cannot slowly forget too old evaluation texts like humans, resulting in the excavated user concerns deviating from the user's latest The focus of attention, which in turn causes businessmen, such as smart home equipment operators, to make mistakes in improving ideas, causing serious economic losses. [0006] The method ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F16/35G06F16/9535G06K9/62
CPCG06F18/23213G06F18/2413
Inventor 梁天恺曾碧
Owner GUANGDONG UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products