A knowledge topic short text hierarchical classification method based on topological characteristic extension

A technology of topological features and hierarchical classification, applied in special data processing applications, instruments, electrical and digital data processing, etc.

Active Publication Date: 2017-07-28
XI AN JIAOTONG UNIV
View PDF3 Cites 17 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the method described in this invention is not aimed at short texts in the field of knowledge, and does not take into account the heterogeneity of knowledge topics in the field of knowledge and the hierarchical characteristics of the knowledge system structure

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A knowledge topic short text hierarchical classification method based on topological characteristic extension
  • A knowledge topic short text hierarchical classification method based on topological characteristic extension
  • A knowledge topic short text hierarchical classification method based on topological characteristic extension

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0073] The present invention will be further described in detail below in conjunction with specific embodiments, which are for explanation rather than limitation of the present invention.

[0074] The method for hierarchical classification of short texts based on topological feature expansion of knowledge topics provided by the present invention includes the following three processes:

[0075] 1) Initial text feature construction:

[0076] 1-1) Preprocess the short text to construct a short text file system. Preprocessing includes removing punctuation marks in short texts, removing redundant spaces, removing stop words, and restoring various forms of words. Among them, the morphological restoration process uses the CoreNLP open source system of Stanford University.

[0077] 1-2) The method of information entropy is used for text feature selection, and the calculation process is as follows:

[0078]

[0079] Where: T i Is a sub-topic of knowledge topic T, and the short text file system ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a knowledge topic short text hierarchical classification method based on topological characteristic extension. The method can effectively organize and manage knowledge topic short texts and solve the problem of information overload caused by mass knowledge short texts in the internet. The method comprises the steps of: 1) building of initial text characteristics; 2) topological characteristic-based short text characteristic extension; 3) transfer learning method among heterogeneous knowledge topics. By collecting the lengths of short texts corresponding to a plurality of knowledge topics and performing primary quantitative statistical analysis, the sparse degree of the characteristics of knowledge topic short texts becomes clear; a knowledge topic short text network is built and analyzed according to term co-occurrence conditions among short texts, and finally community characteristics are selected to effectively extend text characteristics. The difference of domains is measured by calculating the KL divergence among knowledge topics and further auxiliary data are selected; by converting hierarchical classification to multi-classification, the method can effectively organize and manage knowledge topic short texts.

Description

Technical field [0001] The invention relates to the field of data mining, in particular to a hierarchical classification method for short texts of knowledge topics based on topology feature expansion. Background technique [0002] With the development of science and technology and the explosive growth of human knowledge, various open knowledge sources on the Internet have become important sources for people to exchange information and obtain knowledge. On the one hand, they have greatly promoted the dissemination and application of knowledge, but at the same time they have also increased knowledge Fragmentation phenomenon. The phenomenon of knowledge fragmentation is easy to cause learners' cognitive overload, leading to the "distraction effect", and also easy to cause problems such as learners' cognitive bias. Since the knowledge carriers of each open knowledge source are "short texts", how to effectively organize and manage short texts is the key to solving the problem of know...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/355
Inventor 魏笔凡吴蓓刘均郑庆华郭朝彤郑元浩吴科炜
Owner XI AN JIAOTONG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products