Text label determination method and device

A technology for text labeling and determining methods, which is applied in text database clustering/classification, unstructured text data retrieval, semantic analysis, etc. It can solve problems affecting the accuracy of the model and non-standard text labels, and achieve the goal of improving accuracy Effect

Active Publication Date: 2017-05-03
NEUSOFT CORP
View PDF3 Cites 22 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] In view of the above problems, the present invention provides a method and device for determining

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text label determination method and device
  • Text label determination method and device
  • Text label determination method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0060] Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. Although exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited by the embodiments set forth herein. Rather, these embodiments are provided for more thorough understanding of the present disclosure and to fully convey the scope of the present disclosure to those skilled in the art.

[0061] In order to solve the problem that the existing text labels are not standardized and affect the accuracy of the model, the embodiment of the present invention provides a method for determining the text labels, such as figure 1 As shown, the method includes:

[0062] 101. Use the pre-segmented corpus as a semantic-based word conversion vector tool for training the word vector model training corpus to obtain a word vector trai...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a text label determination method and device and relates to the field of natural language processing technology. The problem that model accuracy is affected because text labels are not standardized is solved. The method comprises the steps that a preset corpus obtained after word segmentation is used as a semantic-based word conversion vector tool training corpus used for training a word vector model, and a word vector training model is obtained; label words corresponding to texts in the corpus are converted into corresponding label word vectors according to the word vector training model; the label word vectors corresponding to all the label words in the corpus are clustered according to a preset clustering algorithm to obtain multiple label sets; a cluster word is distributed for each label set, and the corresponding relation between the cluster words and the label words is determined; according to the corresponding relation between the label words and the cluster words, the cluster word corresponding to the label word of each text in the corpus is determined as a new label word of the corresponding text. The text label determination method and device are applied to the text analysis and processing process.

Description

technical field [0001] The present invention relates to the technical field of natural language processing, in particular to a method and device for determining a text label. Background technique [0002] In the process of natural language processing, when analyzing the text in the corpus, some supervised learning algorithms used require the text with labels as the training corpus of the training model, and the normativeness of the label corresponding to the text determines the quality of the trained model. accuracy. At present, the corpus is usually composed of texts crawled from the Internet, but the text labels in the corpus obtained from the Internet are many and miscellaneous, and there is no standardized label. For example, there are multiple representations of the same semantic label, such as Google, Google; father, father, father, father, etc. Therefore, training the model based on the acquired non-standard labels usually affects the accuracy of the model. Content...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30G06K9/62G06F17/27
CPCG06F16/35G06F40/30G06F40/289G06F18/23213
Inventor 李玉信
Owner NEUSOFT CORP
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products