Theme identification method, system and equipment based on theme co-occurrence network and external knowledge

A technology of external knowledge and recognition methods, applied in the field of text recognition, can solve the problems of not further mining the data co-occurrence network, not considering the characteristics of subwords, reducing data redundancy, etc., to achieve full and more efficient use, rich and complete information , the effect of improving performance

Active Publication Date: 2021-05-14
XI AN JIAOTONG UNIV
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0007] 3. A control method and device for extracting Chinese entity relations based on word co-occurrence, patent number: CN201110001355.9; patent 1 counts the number of subject keywords contained in each text, and finds the subject keywords that contain the most as the text Subject: Patent 2 achieves semi-supervised classificati

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Theme identification method, system and equipment based on theme co-occurrence network and external knowledge
  • Theme identification method, system and equipment based on theme co-occurrence network and external knowledge
  • Theme identification method, system and equipment based on theme co-occurrence network and external knowledge

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0040] The present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments.

[0041] refer to figure 1 , a self-training topic recognition method based on topic co-occurrence network and external knowledge, including construction of topic co-occurrence network based on labeled data, construction of switch module integrating external knowledge and topic co-occurrence network, and improved domain knowledge text topic recognition by introducing switch module Model self-training, the specific process is as follows:

[0042]Construct a topic co-occurrence network based on labeled data: first, initialize a topic co-occurrence network, each topic is represented by a node, all nodes are connected by an edge, and the weight of the edge is 0, for each topic label The domain knowledge texts, identify and record the subject keywords and subwords of the subject keywords appearing in each text after word segmentation, and record the...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a theme identification method based on a theme co-occurrence network and external knowledge, and the method specifically comprises the steps: constructing a theme co-occurrence network based on annotation data: detecting theme sub-words in an existing domain knowledge text with theme annotation, and constructing the theme co-occurrence network according to the theme sub-words; constructing a switch module fusing external knowledge and a topic co-occurrence network: performing information richness sorting on the domain knowledge text with the topic annotation by using the co-occurrence network, and combining the sorting with the external knowledge to form the switch module; improving self-training of a domain knowledge text theme recognition model by introducing a switch module: training the domain knowledge text theme recognition model by using a self-training method, using information of the domain knowledge text without theme annotation as far as possible, and preventing generalization performance reduction caused by non-selective learning of the domain knowledge text without theme annotation by self-training; limited corpus information is fully and efficiently utilized, and the performance of a domain knowledge text theme recognition model is improved.

Description

technical field [0001] The invention belongs to the technical field of text recognition, and in particular relates to a topic recognition method, system and equipment based on a topic co-occurrence network and external knowledge. Background technique [0002] The explosive growth of the total amount of knowledge in the Internet age has made the problem of knowledge fragmentation increasingly prominent. Due to the lack of a unified form of knowledge organization and management, users usually need to spend more time looking for valuable domain knowledge texts in the massive fragmented knowledge for learning. , the efficiency of information acquisition is low. Classifying and storing domain knowledge texts according to the knowledge topics they describe can improve the efficiency of retrieval and learning, and at the same time provide technical support for downstream applications related to other domain knowledge. [0003] Domain knowledge texts usually contain a lot of domain...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F16/34G06F40/289G06N3/08
CPCG06F16/345G06F40/289G06N3/08
Inventor 魏笔凡祁敬超刘均郑庆华杨祎罗强洪振杰武雨辰
Owner XI AN JIAOTONG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products