Text information associating and clustering collecting processing method based on domain knowledge model

A technology of domain knowledge and processing methods, applied in the field of text association analysis and clustering collection processing, can solve problems such as weak pertinence and large intelligence deviation, and achieve the effect of improving the accuracy of association

Active Publication Date: 2016-06-15
10TH RES INST OF CETC
View PDF4 Cites 61 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0007] The purpose of the present invention is to solve the problem that the traditional method does not consider the guiding role of domain knowledge in the text information association process, resulting in relatively large deviations and weak pertinence of the associated information; it provides a text that introduces intelligence domain knowledge and prior information. Correlation analysis of intelligence content, realize the correlation and integration of intelligence according to the subject type of the target event, and improve the correct correlation rate of the subject category of the text intelligence

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text information associating and clustering collecting processing method based on domain knowledge model
  • Text information associating and clustering collecting processing method based on domain knowledge model
  • Text information associating and clustering collecting processing method based on domain knowledge model

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0017] In order to better understand the present invention, we first introduce topic templates based on domain knowledge and topic graph models for learning and training topic templates.

[0018] refer to figure 1 . According to the present invention, using intelligence field knowledge modeling and topic map technology to guide the association analysis of text information, step S1 text information preprocessing: collect text information training set for word segmentation, part-of-speech tagging, remove stop words, retain nouns and verbs, extract Word intervention processing to obtain the normalized text word segmentation sequence of the text intelligence training set; step S2 feature vocabulary vector extraction: extract the feature vocabulary vector of the intelligence training set text word segmentation sequence through Chinese named entity recognition and domain dictionary query; step S3 event topic vocabulary Learning: use the topic map model to learn and train to extract...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a text information associating and clustering collecting processing method based on a domain knowledge model. The method comprises the following steps that a text information training set is searched, stemming preprocessing is conducted, and feature word vectors of a text participle sequence of the information training set are extracted through Chinese named entity identification and domain dictionary query modes; representative feature words of a target event are extracted through topic graph model learning training, and a weighted value of topic associating affiliation is calculated; a feature word set is built according to the topic associating affiliation weighted value, calculated through training, of the feature words, and an event topic word template is built; feature word vectors of a participle sequence accessed to text in real time are extracted through the Chinese named entity identification and domain dictionary query modes; the similarity distance of the feature word vectors and all the target event knowledge templates is calculated; the association relationship of multiple texts to the same topic target event is determined according to the similarity threshold, and classification reorganization is conducted by means of a similarity distance ordering rule.

Description

technical field [0001] The invention relates to the field of text association and classification of subject content, that is, automatic text association analysis and clustering collection processing are realized by using computer processing technology. Background technique [0002] Text intelligence information analysis is an important part of the comprehensive information platform. With the current collection of intelligence text information in many ways, the sources of text intelligence are becoming more and more diverse, and the amount of information data is gradually increasing, which brings more and more difficulties to text intelligence analysis. One of the ways to realize the efficient analysis and effective analysis of text intelligence is to first correlate and classify text intelligence, and aggregate and analyze text intelligence with high correlation and close relationship. At present, the general steps of text intelligence analysis are as follows: (1) Transform ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/355G06F16/374
Inventor 陈怀新袁伟张宇俞鸿波谢卫
Owner 10TH RES INST OF CETC
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products