Class center vector text classification method based on dependency, word class and semantic dictionary

A technology of dependency relationship and semantic dictionary, which is applied in the field of class-centered vector text classification, can solve problems such as large vector dimension, low classification accuracy, and sparse vector weights, and achieve reduced vector weights, high classification efficiency, and high classification accuracy Effect

Active Publication Date: 2018-11-06
深圳占领信息技术有限公司 +1
View PDF10 Cites 21 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Although the Bayesian algorithm is simple in principle and easy to implement, it is based on an assumption that only when the text data sets are independent of each other, the classification accuracy will be high, so it has certain limitations when used in text classification; K nearest neighbor algorithm The classification accuracy is very high, but the classification efficiency is very low. It has a relatively good classification effect in the face of small-scale corpus, but it will have a long classification time when encountering large-scale corpus; support vector machine due to its generalization ability It is very strong and widely applicable to small sample corpus, but in the classification experiment of large-scale corpus, the classification effect of support vector machine is not very good; the main advantage of the class center vector method is that the corpus is analyzed before the classification experiment. Substantial reduction, so the calculation amount of the classification experiment is small, and the classification efficiency is high, but the dimension of the vector is too large, and the vector weight is too sparse, resulting in low classification accuracy

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Class center vector text classification method based on dependency, word class and semantic dictionary
  • Class center vector text classification method based on dependency, word class and semantic dictionary
  • Class center vector text classification method based on dependency, word class and semantic dictionary

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0092] Experimental comparison of feature selection

[0093] This example combines the three-layer feature selection of dependency relationship, semantic dictionary and part-of-speech to obtain F as shown in Table 3. 1 The comparison results of value improvement.

[0094] Table 3 Feature selection pair F 1 value increase

[0095]

[0096]

[0097] As can be seen from Table 3, when the feature selection is only based on the dependency relationship, the classification experiments of Bayesian, KNN and the text classification method of the present invention on the Fudan corpus, Sogou corpus and 20Newsgroups corpus respectively show that based on the dependency relationship. The feature selection method has a very good classification effect; after introducing the semantic dictionary based on the feature selection method based on the dependency relationship, compared with the traditional feature selection, the improvement range is between 1.52% and 7.91%, and the contributio...

Embodiment 2

[0099] Comparison of improved experiments of class center vector method

[0100] According to the class center vector text classification method based on dependency relationship, part of speech and semantic dictionary proposed by the present invention, the present invention has carried out experiment respectively on three corpus, aim at three innovations of the method of the present invention, and original class center vector The method was compared experimentally, as shown in Table 4.

[0101] Table 4 The improved method of the present invention and the comparison result of the traditional class center vector method

[0102]

[0103] It can be seen from Table 4 that the improved method of the present invention and the class center vector method have carried out three stages of comparative experiments. The F1 values ​​of the three stages have been improved to varying degrees, and the time spent is getting shorter and shorter. This is mainly due to the fact that the present...

Embodiment 3

[0105] Experimental comparison of classification efficiency of class center vector method

[0106] There are many text classification algorithms, such as Bayesian algorithm, KNN algorithm, and class center vector method. Using Bayesian, KNN and class center vector method to conduct ten cross-validation classification experiments on the three preprocessed corpus, and count the classification time and use F 1 The experimental results are shown in Table 5.

[0107] Table 5 Comparison of classification algorithm efficiency and accuracy

[0108]

[0109] As can be seen from Table 5, in the classification experiments of Fudan corpus, Sogou corpus and 20Newsgroups corpus, the class center vector method of the present invention is the shortest classification method, and other classification algorithms are time-consuming.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to text classification of natural language processing, and specifically relates to a class center vector text classification method based on dependency, word class and a semanticdictionary. To overcome the semantic defect of a feature selection algorithm based on statistics, the invention introduces the dependency, the semantic dictionary and the word class to optimize and cluster text features, provides an improved weight calculation formula, and further provides an improved class center vector text classification method. The text classification method of the inventionhas advantages of both high classification efficiency of a traditional class center vector method and high classification precision of a K-nearest neighbor algorithm, and can be widely used in variousclassification systems.

Description

technical field [0001] The invention relates to text classification in natural language processing, in particular to a class center vector text classification method based on dependency relationship, part of speech and semantic dictionary. Background technique [0002] With the rapid development of computer technology, especially in the context of the "Internet +" era, network information such as documents, pictures, audio and video has exploded exponentially, and a large number of electronic files exist in the form of electronic files in daily life. How to obtain the desired information from massive data is a hot and difficult point in current research, and text classification is one of the important research directions. [0003] Text classification is an important research direction in text processing technology, which began in the 1950s. It is a comprehensive technology integrating linguistics, mathematics, computer science and cognitive science. In the late 1950s, H.P. ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 朱新华徐庆婷吴田俊
Owner 深圳占领信息技术有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products