Unlock instant, AI-driven research and patent intelligence for your innovation.

A classification method and device for unlabeled corpus

A classification method and unlabeled technology, applied in the field of artificial intelligence, can solve the problems of "overfitting, low accuracy of unlabeled corpus classification, time-consuming and labor-intensive, etc., and achieve the effect of improving accuracy.

Active Publication Date: 2021-07-02
INDUSTRIAL AND COMMERCIAL BANK OF CHINA
View PDF6 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the training of deep learning models needs to collect and label a large number of samples, which is very time-consuming and labor-intensive, and the accumulation of a large amount of labeled data (ie sample data) takes a very long time, and a large number of high-quality labeled data is very expensive.
Another thing is that there are too many parameters in the deep learning model, and it is easy to "overfit" when the sample data is small, and it is very sensitive to noise data.
In order to solve the overfitting problem caused by less sample data, the existing technology adopts a simple model when selecting a model, and uses technologies such as penalty items, and uses technologies such as denoising and sample expansion in data processing, but it is still difficult to solve the problem of sample data. Too little data leads to the problem that the trained deep learning model is not accurate enough, resulting in low classification accuracy of unlabeled corpus, which affects the application of the deep learning model

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A classification method and device for unlabeled corpus
  • A classification method and device for unlabeled corpus
  • A classification method and device for unlabeled corpus

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0026] In order to make the purpose, technical solutions and advantages of the embodiments of the present invention more clear, the embodiments of the present invention will be further described in detail below in conjunction with the accompanying drawings. Here, the exemplary embodiments and descriptions of the present invention are used to explain the present invention, but not to limit the present invention.

[0027] figure 1 It is a schematic flow chart of the classification method of unlabeled corpus provided by an embodiment of the present invention, such as figure 1 As shown, the classification method of the unlabeled corpus provided by the embodiment of the present invention includes:

[0028] S101. Acquire unlabeled corpus, where the unlabeled corpus includes at least one question;

[0029] Specifically, whether it is a human customer service or a dialogue robot, voice dialogues will be generated during the process of serving customers. The above voice dialogues can...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The present invention provides a method and device for classifying unlabeled corpus, said method comprising: obtaining unlabeled corpus, said unlabeled corpus including at least one question; inputting each question included in said unlabeled corpus into a text classification model , output the label corresponding to each question; wherein, the text classification model is obtained after training based on an unlabeled corpus sample, and each piece of corpus data in the unlabeled corpus sample includes a question and an answer. The device is used to perform the above method. The method and device for classifying unlabeled corpus provided by the embodiments of the present invention improve the accuracy of classifying unlabeled corpus.

Description

technical field [0001] The invention relates to the technical field of artificial intelligence, in particular to a method and device for classifying unlabeled corpus. Background technique [0002] With the development of artificial intelligence technology, artificial intelligence-based dialogue robots have been widely used in many fields such as customer service, outbound calls, sales, intelligent search, etc., and intent recognition, as the core technology in the dialogue robot system, directly determines the accuracy of the dialogue and user experience. [0003] At present, in the intent recognition technology, the more effective technology is the deep learning model. The deep learning model obtained through training can realize the classification of unlabeled corpus, which is helpful for the recognition of intent. However, the training of deep learning models needs to collect and label a large number of samples, which is very time-consuming and labor-intensive, and the a...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/35
CPCG06F16/35
Inventor 刘华杰李晓萍张宏韬
Owner INDUSTRIAL AND COMMERCIAL BANK OF CHINA