Multi-label corpus text classification method based on semi-supervised learning

A semi-supervised learning and text classification technology, applied in the field of multi-label corpus text classification based on semi-supervised learning, can solve the problems of consuming server performance, slow calculation speed, and time-consuming, etc., to improve scalability and practicality, The effect of reducing computational complexity and amount of calculation and improving efficiency

Inactive Publication Date: 2020-02-04
厦门美域中央信息科技有限公司
View PDF3 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] In the existing methods for classifying text, as the amount of information becomes more and more abundant, people will have higher and higher requirements for the accuracy rate and recall rate of content search, and the number of samples contained in the training set is also very large. Calculating the similarity with each sample in the training set by traversal requires a lot of performance of the server, and the calculation speed is slow
As a result, the effective resources of the server are occupied in large quantities, and the calculation time is too long, so it takes a lot of time to answer or push relevant information to the user

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Multi-label corpus text classification method based on semi-supervised learning
  • Multi-label corpus text classification method based on semi-supervised learning
  • Multi-label corpus text classification method based on semi-supervised learning

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0039] In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be further described in detail below in conjunction with specific embodiments and with reference to the accompanying drawings. It should be understood that these descriptions are exemplary only, and are not intended to limit the scope of the present invention. Also, in the following description, descriptions of well-known structures and techniques are omitted to avoid unnecessarily obscuring the concept of the present invention.

[0040] Such as Figure 1-3 As shown, a kind of multi-label corpus text classification method based on semi-supervised learning proposed by the present invention comprises the following steps:

[0041] S1. Carry out semi-supervised learning based on the multi-label corpus text, and obtain the classification strategy knowledge base;

[0042] S2. Preprocessing the corpus text to be classified to obtain the feature words in...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a multi-label corpus text classification method based on semi-supervised learning, and the method comprises the following steps: carrying out the semi-supervised learning basedon a multi-label corpus text, and obtaining a classification strategy knowledge base; preprocessing the corpus text to be classified; classifying the corpus classification texts, and determining a first text content identifier set; determining a first text content set in the preset training data set, and in the first text content set, selecting text contents corresponding to N candidate categories according to the certain number of candidate categories to determine a second text content set; and determining a target category of the to-be-classified text according to the similarity between thetext feature words and each piece of text content in the second text set. The method has the advantages that the calculation complexity and the calculation amount are reduced, and the efficiency of the text class is improved.

Description

technical field [0001] The invention relates to the field of corpus text classification, in particular to a semi-supervised learning-based multi-label corpus text classification method. Background technique [0002] Text classification is an important content of text mining, which refers to determining a category for each document in a document collection according to a predefined subject category. Classifying documents through an automatic text classification system can help people better find the information and knowledge they need. Classification is seen as the most basic form of cognition of information. [0003] With the rapid growth of text information, especially the surge of online text information on the Internet, automatic text classification has become a key technology for processing and organizing large amounts of document data. Now, text classification is widely used in various fields. For example, on the Internet platform, the server can classify the text in...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/35G06F16/31G06K9/62
CPCG06F16/355G06F16/319G06F18/2155G06F18/22G06F18/2411
Inventor 肖清林
Owner 厦门美域中央信息科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products