Unlock instant, AI-driven research and patent intelligence for your innovation.

High-expansibility and multi-label text classification method and device

A text classification and multi-label technology, applied in the field of Internet text, can solve the problems of unable to analyze the content of non-preset keywords, unable to deal with polysemy, and low judgment accuracy

Pending Publication Date: 2021-03-30
WISERS INFORMATION LTD
View PDF0 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Both of these methods have their own disadvantages
[0004] For the classification method based on keyword matching and keyword combination logic, first of all, this method needs to manually set keywords and keyword combination logic in advance, which is not only inefficient but also high in labor costs, and cannot analyze non-preset keywords in the text. The content of keywords and the inability to handle polysemous words in the text, such as the word "apple", can refer to a fruit or a world-renowned Apple company. It is impossible to analyze what "apple" refers to with this kind of combination logic. Which one, but can only judge this word according to the pre-set explanation; secondly, this method cannot interpret complex language expression structures and complex classifications, for example, this method can handle a single classification such as "bankruptcy", but cannot Dealing with the more granular category of "industry policy"
[0005] For the text classification method based on machine learning, this method needs to rely on a large amount of manually labeled data for training, the cost is very high and the scalability is poor. Once the training data is insufficient, the classifier will not be able to judge unknown data; at the same time, Like the combinatorial logic above, the classification method of machine learning has high judgment accuracy for broad concept classifications such as "finance", "sports", etc., but for low-level classifications of fine-concept text local content such as "potassium cyanide poisoning" The accuracy of judgment is low, and because machine learning requires a lot of training, its scalability is poor. Whenever a new classification appears, the model needs to be retrained, which leads to slow update and high cost of such classification methods

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • High-expansibility and multi-label text classification method and device
  • High-expansibility and multi-label text classification method and device
  • High-expansibility and multi-label text classification method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0073] In order to make the object, technical solution and advantages of the present invention clearer, the embodiments of the present invention will be further described in detail below in conjunction with the accompanying drawings.

[0074] The present invention is divided into two parts, the first part is to classify the topic of the text, and the second part is to construct the topic classification template that can be used in the first part. figure 1 It is a flow chart of the algorithm for classifying text by topic in the present invention, wherein the algorithm flow is further divided into 7 steps, namely step 101 to step 107. figure 2 The construction process of the topic classification template of the present invention is divided into the construction of the basic topic classification template and the construction of the customized topic classification template, which is a total of 5 steps, namely step 201 to step 205.

[0075] Step 101 To: preprocess the text to be...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a high-expansibility and multi-label text classification method and device. The high-expansibility and multi-label text classification method and device include a multi-label text classification method and device, and a topic classification template construction method and device which can be used by the multi-label text classification method and device. The multi-label textclassification method comprises the following steps: preprocessing a received text to be classified; calculating a first similarity between the word vector of each word in the preprocessed text and atext center semantic vector; calculating a second similarity between the word vector of each word in the preprocessed text and the central semantic vector of each topic in a topic classification template; calculating a score of the text under each topic in the topic classification template according to the first similarity and the second similarity; filtering non-representative topic labels according to the scores; and outputting the final topic label and the score of each label according to the filtering result.

Description

technical field [0001] The invention relates to the field of Internet texts, in particular to a method for classifying and labeling Internet texts. Background technique [0002] With the advent of the Internet era and the era of big data, people are increasingly inseparable from the Internet and perform various operations online every day. For example, in China, payment in most cities is basically done by mobile payment. The traces that people leave on the Internet every day make the information on the Internet particularly valuable. The era of big data is to analyze these information to obtain valuable information or intelligence. In the era of the development of the Internet, there are a lot of text information on the Internet, and the amount of text information is huge. If it is necessary to analyze the large amount of information, it is first necessary to effectively classify the information. Classifying these network information requires converting these unstructured t...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/33G06F16/35G06F40/289G06F40/30G06F16/953
CPCG06F16/3344G06F16/35G06F16/953
Inventor 梁冠卿高孝先何超
Owner WISERS INFORMATION LTD