Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Method and system for analyzing potential theme phrases of text data

A technology of text data and phrases, applied in the direction of electronic digital data processing, special data processing applications, instruments, etc., can solve problems such as inability to obtain effective subject phrase results, lack of statistical information for phrases, poor readability, and poor consistency and visualization

Active Publication Date: 2019-08-16
HUAIYIN INSTITUTE OF TECHNOLOGY
View PDF4 Cites 11 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] Purpose of the invention: In order to overcome the deficiencies of the prior art, the present invention provides a method for analyzing potential topic phrases of text data, which overcomes the readability and consistency of the topic results obtained from the traditional topic model trai

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and system for analyzing potential theme phrases of text data
  • Method and system for analyzing potential theme phrases of text data
  • Method and system for analyzing potential theme phrases of text data

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0092] The present invention will be further described in detail with reference to the accompanying drawings. Obviously, the described embodiments are only some of the embodiments of the present invention, not all of them. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

[0093] The present invention provides a method for analyzing potential subject phrases of text data, such as figure 1 shown, including the following steps:

[0094] S1 collects a text data set, and performs word segmentation on the text data set to obtain a word expression form of the text data set.

[0095] S2 extracts the effective phrases formed by word collocation according to the words in the text data set, and obtains the mixed expression form of words and phrase sets that are not matched into effective phrases.

[0096] This step specifical...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a method and system for analyzing potential theme phrases of text data, and the method comprises the steps: collecting a text data set, carrying out the word segmentation of the text data set, and obtaining the word representation form of the text data set; extracting effective phrases formed after word matching according to the words in the text data set, and obtaining a mixed representation form of the words which are not matched into the effective phrases and the phrase set; carrying out word vector training on the text data set in the mixed representation form to obtain a corresponding word vector model; constructing DR-Phrase LDA and solving each parameter; training the DR-Phrase LDA, and outputting potential theme phrases of the text data according to a training result. According to the invention, a phrase topic model based on word vectors is adopted; according to the model, statistics information of phrases in model training is reasonably improved by means of a Chinese language law in probability topic model training, specifically, a word vector method is adopted for measuring the relation between phrase component words, the semantic relation of the words in text integrity and phrase local is quantitatively reflected, and the model precision is higher.

Description

technical field [0001] The invention relates to the field of text data mining and analysis, in particular to a method and system for analyzing potential topic phrases of text data. Background technique [0002] With the development of information technology, a large amount of electronic texts have been accumulated in various fields, resulting in information overload. In order to help people quickly retrieve, find and effectively use this information, text semantic and structural analysis has become one of the current research hotspots. Among them, the analysis of potential subject information from text data is one of the key technologies for advanced application systems such as information retrieval, recommendation systems, and automatic summarization. Existing common methods use traditional "bag of words"-based probabilistic topic models such as LDA and PLDA for text topic analysis. The topic results analyzed by these methods are presented in the form of keywords, while h...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/27
CPCG06F40/216G06F40/289
Inventor 马甲林张琳程清雯
Owner HUAIYIN INSTITUTE OF TECHNOLOGY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products