Natural language theme classification method and device

A natural language and topic classification technology, applied in neural learning methods, text database clustering/classification, text database query, etc., can solve the problems of unable to achieve adaptive feature selection, classification accuracy limitations, etc., to improve classification accuracy , the effect of avoiding feature dependencies

Active Publication Date: 2020-02-21
HARBIN INST OF TECH SHENZHEN GRADUATE SCHOOL
View PDF11 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the establishment of the initial feature library of this method relies on manually selected features, which cannot achieve adaptive selection of features, and the classification accuracy is limited.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Natural language theme classification method and device
  • Natural language theme classification method and device
  • Natural language theme classification method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0060]This embodiment provides a method for classifying natural language topics. As we all know, the classification of natural language topics is one of the contents that students must master at present. For example, the classification of themes of ancient poems can help students understand the main idea of ​​ancient poems. Therefore the scheme of the present application can play the effect of auxiliary teaching.

[0061] The natural language topic classification method includes: a training stage and a classification stage.

[0062] figure 1 It is a flow chart of the training phase in the natural language topic classification method of Embodiment 1.

[0063] see figure 1 This training phase includes:

[0064] Step 101: Obtain natural language text segments of known topics as a sample set.

[0065] Step 102: Extracting multiple words with the highest frequency of occurrence in the sample set to obtain multiple feature words; specifically: using the Sunday algorithm to ret...

Embodiment 2

[0088] In this embodiment 2, the technical solution of the present invention is described in detail by taking an ancient poem text as an example.

[0089] As a special type of natural language text, ancient poetry texts are different from modern texts in sentence structure, format, and expression, and their content is implicit, obscure, and extremely refined. In addition, monosyllabic words account for the majority of ancient poems, and this feature also brings a lot of problems to the selection of features. The present invention forms the most efficient feature spectrum (feature spectrum is the collection of a plurality of selected features) by adaptively selecting the most useful feature for text classification. Since the classification task is completed according to feature selection, the selection of feature It should be affected by the completion of the final task, that is, the quality of the classification directly affects the selection of features, so it is very suitabl...

Embodiment 3

[0123] This embodiment 3 provides a kind of natural language subject classification device, comprising:

[0124] A sample acquisition device, configured to acquire a natural language text segment of a known topic as a sample set;

[0125] A high-frequency word extraction device is used to extract a plurality of words with the highest frequency of occurrence in the sample set to obtain a plurality of feature words;

[0126] A vector representation device, configured to represent each of the feature words as a vector to obtain a plurality of feature vectors;

[0127] A similarity calculation device is used to calculate the similarity between any two feature vectors to obtain a similarity set; the similarity set reflects the characteristics and connections of multiple feature vectors;

[0128] The training and classification device is used to input the degree of similarity, the theme and the feature words corresponding to each theme into the preset neural network structure for t...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a natural language theme classification method and device. A training stage in the method comprises the following steps: acquiring a natural language text segment with a knowntheme as a sample set; extracting a plurality of words with the highest occurrence frequency in the sample set to obtain a plurality of feature words; representing each feature word as a vector to obtain a plurality of feature vectors; calculating the similarity degree between any two feature vectors to obtain a similarity degree set; inputting the similarity degree, the topics and the feature words corresponding to the topics into a preset neural network structure for training to obtain a feature spectrum and a model for expressing a relationship between the feature spectrum and a classification result. The classification stage comprises the steps of obtaining a to-be-classified natural language text segment; extracting feature words belonging to the feature spectrum in the to-be-classified natural language text segments to obtain input feature parameters; and inputting the input characteristic parameters into a model for expressing the relationship between the characteristic spectrumand the classification result to obtain the classification result. The method can achieve the adaptive selection of features, and improves the classification accuracy.

Description

technical field [0001] The invention relates to the field of natural language classification, in particular to a natural language subject classification method and device. Background technique [0002] Text classification algorithms have existed for a long time. In the 1950s, scientists had used the "expert system" method to classify texts. However, the scope of coverage and classification accuracy of this method are very limited, and it can only be used to solve some problems. Well-conditioned, well-described, and organized text classification problems. With the development of statistical methods, especially the increase in the number of online texts on the Internet and the rise of machine learning disciplines after the 1990s, a set of classic methods for solving large-scale text classification problems has gradually formed. The main process is "artificial feature engineering" + "Classification model", which splits the entire text classification problem into two parts: fea...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F16/33G06F16/35G06N3/08
CPCG06F16/35G06F16/3334G06F16/3346G06N3/08
Inventor 赵毅王一峰
Owner HARBIN INST OF TECH SHENZHEN GRADUATE SCHOOL
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products