Short text clustering analysis method, device and terminal device
A cluster analysis and short text technology, applied in the field of text analysis, can solve the problem of low accuracy, and achieve the effect of improving efficiency and accuracy
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0051] see figure 1 , provides a schematic flow diagram of an embodiment of a short text clustering analysis method, described in detail as follows:
[0052] Step S101, obtaining a short text data set to be clustered, and performing preprocessing on the short text data set to obtain an initial word set including at least three parts of speech.
[0053]Short texts are composed of multiple parts of speech words to express emotional information. When analyzing short texts, it is necessary to split the short text data set into word sets including multiple parts of speech, and remove the words that have little impact on emotional information. Words with low frequency etc. Specifically, this embodiment can divide the short text into several words through the word segmentation algorithm, and can delete word stems, stop words, and words with low document frequency through the word filtering method. The purpose of this step is to reduce the dimensionality of the data set Denoising, t...
Embodiment 2
[0118] Corresponding to the short text clustering analysis method described in the first embodiment above, Figure 5 shows the structural block diagram of the short text clustering analysis device in Embodiment 2 of the present invention. For ease of description, only the parts related to this embodiment are shown.
[0119] The device includes: a preprocessing module 110 , a feature extraction module 120 , a knowledge pair determination module 130 and a topic clustering module 140 .
[0120] The preprocessing module 110 is used to obtain the short text data set to be clustered, and perform preprocessing on the short text data set to obtain an initial word set including at least three parts of speech.
[0121] The feature extraction module 120 is used to perform feature extraction on the initial word set to obtain a feature word set including a topic feature word set and a topic associated word set.
[0122] The knowledge pair determination module 130 is used to determine a p...
Embodiment 3
[0130] Figure 6 It is a schematic diagram of the terminal device 100 provided in Embodiment 3 of the present invention. Such as Figure 6 As shown, the terminal device 100 described in this embodiment includes: a processor 150, a memory 160, and a computer program 161 stored in the memory 160 and operable on the processor 150, such as a short text clustering analysis method program of. When the processor 150 executes the computer program 161, it realizes the steps in the above-mentioned embodiments of each short text clustering analysis method, for example figure 1 Steps S101 to S104 are shown. Alternatively, when the processor 150 executes the computer program 161, it realizes the functions of the modules / units in the above-mentioned device embodiments, for example Figure 5 The functions of modules 110 to 140 are shown.
[0131] Exemplarily, the computer program 161 can be divided into one or more modules / units, and the one or more modules / units are stored in the memor...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com