Adaptive potential Dirichlet model selection method and apparatus

A Dirichlet model and self-adaptive technology, applied in special data processing applications, instruments, unstructured text data retrieval, etc., can solve problems such as poor accuracy of LDA model, impact analysis and calculation, etc.

Active Publication Date: 2016-07-06
NAT UNIV OF DEFENSE TECH +1
View PDF3 Cites 8 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

In the calculation of the LDA model, the setting of the topic number K is limited by personal experience, and corpora of different sizes have different characteristics, even different corpora of the same size have

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Adaptive potential Dirichlet model selection method and apparatus
  • Adaptive potential Dirichlet model selection method and apparatus
  • Adaptive potential Dirichlet model selection method and apparatus

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0042] The accompanying drawings constituting a part of this application are used to provide further understanding of the present invention, and the schematic embodiments and descriptions of the present invention are used to explain the present invention, and do not constitute an improper limitation of the present invention.

[0043] see figure 1 , one aspect of the present invention provides a method for adaptive latent Dirichlet model selection, comprising the following steps:

[0044] Step S100: convert the corpus into a document word frequency matrix F for LDA model calculation, and set the initial number of topics according to the corpus size, and iteratively calculate the LDA model;

[0045] Step S200: After each round of iteration, the average cosine distance similarity r of the topic-word probability distribution of the LDA model is increased or decreased relative to the average cosine distance similarity r_old in the previous round of iteration calculation, and the nu...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The present invention provides an adaptive potential Dirichlet model selection method and apparatus. The method comprises: initializing a experience theme number K according to a corpus scale; continuously updating the theme number K by calculating an average cosine distance similarity measure of theme-word probability distribution of an LDA model; obtaining a K value that is more suitable for a current corpus than the initial theme number by means of multiple rounds of iteration calculation; and outputting a corresponding LDA mode as a final result model. By dynamically adjusting the theme number K, a model unreasonableness problem caused by personal experience-based subjective setting is avoided to some degree and precision of the model is improved.

Description

technical field [0001] The invention relates to the technical field of natural language processing, in particular to a method and device for adaptive latent Dirichlet model selection. Background technique [0002] With the rapid development of the Internet, the amount of information is increasing day by day, and people's demand for efficient retrieval and acquisition of information is increasingly strong. Since most of the online information is represented by text, automatic classification of text information has become an important research hotspot in the field of information retrieval. In the automatic text classification method, the associated category is determined according to the text content, and the text classification method based on statistical machine learning is the most widely used. One of the common models is the Latent Dirichlet Allocation (LDA) model. [0003] The LDA model is a topic model that can be used to identify hidden topic information in large-scale...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
CPCG06F16/35G06F16/951
Inventor 程光权陈发君刘忠黄金才朱承修保新陈超冯旸赫龙开亮
Owner NAT UNIV OF DEFENSE TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products