Text classification method and device, equipment and medium

A text classification and text technology, applied in text database clustering/classification, unstructured text data retrieval, special data processing applications, etc., can solve the problem of high labor cost, insufficient sensitivity of semantics, and difficult to meet the application requirements of complex scene text classification To achieve the effect of maximizing model contribution, improving semantic sensitivity, and effectively balancing classification results

Inactive Publication Date: 2019-11-29
BEIJING BAIDU NETCOM SCI & TECH CO LTD
View PDF9 Cites 34 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] The technology of text classification based on word vector conversion is currently a commonly used technology. However, some existing solutions rely heavily on feature engineering and the construction process of training samples, which require a lot of labor costs, and others are not sensitive enough to semantics. , it is difficult to meet the text classification application requirements in complex scenarios

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text classification method and device, equipment and medium
  • Text classification method and device, equipment and medium
  • Text classification method and device, equipment and medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0076] figure 1 It is a flow chart of a text classification method in Embodiment 1 of the present application. This embodiment of the present application is applicable to classifying and identifying the text to be classified to determine the category of the text to be classified. The method is executed by a text classification device. The device is implemented by software and / or hardware, and is specifically configured in electronic equipment with certain data computing capabilities.

[0077] Such as figure 1 A text classification method shown, including:

[0078] S101. Obtain text to be classified.

[0079] Among them, the text to be classified can be pre-stored locally in the electronic device, in other storage devices associated with the electronic device, or in the cloud, and the text to be classified can be obtained when needed; or the text to be classified can be obtained from the application software that generates the text to be classified Real-time or timing acquis...

Embodiment 2

[0097] figure 2It is a flow chart of a text classification method in Embodiment 2 of the present application. This embodiment of the present application is optimized and improved on the basis of the technical solutions of the foregoing embodiments.

[0098] Further, before "according to the word vector sequence and entity vector sequence, classify and identify the text to be classified", add "input the word vector sequence into the word vector attention mechanism model to determine the attention of each word vector Weight; input the entity vector sequence into the entity vector attention mechanism model to determine the attention weight of each entity vector"; correspondingly, the operation "according to the word vector sequence and entity vector sequence, classify the text to be classified Recognition" is refined as "according to the word vector sequence, entity vector sequence and their respective attention weights, classify and identify the text to be classified", so as to...

Embodiment 3

[0111] image 3 It is a flow chart of a text classification method in Embodiment 3 of the present application. This embodiment of the present application is optimized and improved on the basis of the technical solutions of the foregoing embodiments.

[0112] The operation of "model training on the entity vector coding model" will be described in detail, and the training process of the entity vector coding model will be refined into "based on the entity description text in the entity knowledge graph database as the training sample of the entity; using the described Entity training samples to train the entity vector encoding model" to improve the model training mechanism of the entity vector encoding model.

[0113] Such as image 3 A text classification method shown, including:

[0114] S301. Using the entity description text in the entity knowledge graph database as an entity training sample.

[0115] Wherein, the training samples may include positive training samples and n...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a text classification method and device, equipment and a medium, and relates to the technical field of natural language processing. According to the specific implementation scheme, to-be-classified texts are obtained; the word sequence of the text to be classified is input into a word vector coding model to determine a word vector sequence of the word sequence; the entity sequence of the text to be classified is input into an entity vector model to determine an entity vector sequence corresponding to the entity sequence; wherein the entity vector model determines an entity vector based on an entity vector encoding model, and the entity vector encoding model is formed by text training based on an entity knowledge graph database; and classification identification is performed on the to-be-classified text according to the word vector sequence and the entity vector sequence. According to the embodiment of the invention, the construction of feature engineering and training samples is avoided, and the construction difficulty of a text classification model is reduced; text classification is comprehensively carried out through the word vector sequence and the entityvector sequence, the semantic sensitivity of the text classification model is improved, and then the accuracy of the classification result of the to-be-classified text is improved.

Description

technical field [0001] The embodiments of the present application relate to computer data processing technology, in particular to the field of natural language processing technology, and specifically to a text classification method, device, device and medium. Background technique [0002] Text classification is the most basic task in the field of machine learning and has the most extensive application scenarios. The goal of text classification is to automatically classify documents in text form into one or more predefined categories. [0003] The technology of text classification based on word vector conversion is currently a commonly used technology. However, some existing solutions rely heavily on feature engineering and the construction process of training samples, which require a lot of labor costs, and others are not sensitive enough to semantics. , it is difficult to meet the text classification application requirements in complex scenarios. Contents of the invention...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/35G06F17/27
CPCG06F16/35
Inventor 汪琦冯知凡张扬朱勇
Owner BEIJING BAIDU NETCOM SCI & TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products