Topic detecting system based on atlas model

A technology of topic detection and graph model, applied in unstructured text data retrieval, instruments, computing and other directions, can solve the problem of unnecessary consideration of lexical dependency information

Active Publication Date: 2014-09-17
EAST CHINA NORMAL UNIV
View PDF2 Cites 19 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, from a semantic point of view, these methods do not take into account the semantic information carried by the vocabulary, especially the entity words themselves, and the dependency information that exists in the context of the sentence where the vocabulary is located.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Topic detecting system based on atlas model
  • Topic detecting system based on atlas model
  • Topic detecting system based on atlas model

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0030] The present invention takes the detection of multi-category text as an example to detect its hidden subject information. The present invention will be further described by taking the Sogou text classification corpus as an example below in conjunction with the accompanying drawings.

[0031] Sogou text classification corpus (link: http: / / www.sogou.com / labs / dl / c.html) includes news texts in 9 categories, namely: sports, culture, recruitment, education, military, information technology, health , economy, tourism.

[0032] refer to figure 1 , the present invention includes following three modules:

[0033] Module 1 preprocessing module. In this embodiment, the preprocessing module utilizes Harbin Institute of Technology Language Technology Platform Cloud (http: / / www.ltp-cloud.com / ) to complete sentence segmentation, word segmentation, named entity recognition and dependency syntax analysis Wait for preprocessing. The results of this module are used for correlation calcu...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a topic detecting system based on an atlas model. The topic detecting system comprises a preprocessing module, an atlas structure constructing module and a subgraph detecting module. The preprocessing module is used for preprocessing a corpus test set. The atlas structure constructing module is used for constructing an atlas structure for expressing an original model, the vertex of the atlas structure is a lexical item of a corpus file, the edges of the atlas structure are the relevance among word pairs, different weights are given to the relevance among the word pairs according to whether words are entity words or not, whether dependence relationships exist among the word pairs or not and the distances among the word pairs, and an undirected graph with the entity words as the atlas center is constructed. The subgraph detecting module is used for processing relevancy atlases to obtain vocabulary sets corresponding to all subgraphs, and then ranking and screening are conducted to obtain final results of all subtopics. By means of the topic detecting system based on the atlas model, implicit topics in a discrete text set can be automatically obtained, and the topic detecting system based on the atlas model can be applied to characteristic space dimensionality reduction, relevancy calculation, semantic extension and other related field of natural language processing.

Description

technical field [0001] The present invention relates to technical fields such as information extraction, shallow semantic analysis, feature space dimensionality reduction, named entity recognition, dependency syntax parsing, clustering algorithm, undirected graph model, and specifically a graph based on the use of entity words and syntactic information A topic detection system that models to detect hidden topics in discrete text collections. Background technique [0002] Shallow semantic analysis has important applications in the field of natural language processing. It is necessary to consider the implicit semantics of documents when judging the relevance of documents. How to find similar concepts or topics in discrete documents is a hot issue in text mining (Text Mining) research. . As one of the important technologies of shallow semantic analysis, topic model is one of the more effective models, and it also has important applications in other fields of machine learning. ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30G06F17/27
CPCG06F16/35G06F40/253
Inventor 林欣赵昂杨静贺樑
Owner EAST CHINA NORMAL UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products