Text classification method lacking negative examples

A text classification and text technology, applied in text database clustering/classification, unstructured text data retrieval, instruments, etc., can solve the problems of lack of statistical theory support, poor accuracy rate, lack of negative example data, etc., to achieve Improve classification accuracy, good classification effect, and efficient classification effect
CN110795564AActive Publication Date: 2020-02-14南京稷图数据科技有限公司

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
南京稷图数据科技有限公司
Publication Date
2020-02-14

Smart Images

  • Figure 1
    Figure 1
  • Figure 2
    Figure 2
  • Figure 3
    Figure 3
Patent Text Reader

Abstract

The invention discloses a text classification method lacking negative examples, and belongs to the technical field of machine learning and text classification. The method comprises the following steps: firstly, determining to-be-classified data texts, and customizing text classification categories; training a TF-IDF model and an LSI model based on the obtained corpus; respectively constructing feature vectors of the text based on the trained TF-IDF model and the LSI model, and constructing a combined text feature vector based on an ensemble method; secondly, training a Basic classifier by adopting an ROC-SVM combination algorithm, training the Basic classifier in combination with a k-means clustering method, and training a label classifier at the same time; and finally, initially classifying the text to be classified by using a Basic classifier, screening by using Elasticsearch, determining candidate classifications, and accurately classifying the document to be classified into one ormore of the custom classifications by using a label classifier. Text data lacking negative examples can be effectively classified, the accuracy is high, the effect is good, and the efficiency is high.
Need to check novelty before this filing date? Find Prior Art

Description

technical field

[0001] The invention belongs to the technical field of machine learning and text classification, and in particular relates to a text classification method lacking negative examples. Background technique

[0002] With the development of the Internet, the number of Internet texts has increased dramatically, and the resulting demand for textual classification has also become stronger. In the face of massive data texts, manual classification is obviously impossible, but with the rise of machine learning methods, it provides ideas to solve this demand. Therefore, a large number of researchers have proposed a series of methods around this field. For example, machine learning methods such as naive Bayesian method, decision tree, k-nearest neighbors, and support vector machines have been successfully applied to text classification and achieved good results. However, because the data texts in different fields are intricate and the mechanisms of many methods are diffe...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More