Unlock instant, AI-driven research and patent intelligence for your innovation.

Annotation data acquisition method and device, and equipment

A technology for labeling data and obtaining methods, which is applied in the field of labeling data acquisition, can solve the problems of large amount of original data, many manpower, and cost, and achieve the effect of saving manpower

Pending Publication Date: 2021-03-30
大众问问(北京)信息科技有限公司
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] At present, the scheme of obtaining labeled data generally includes: obtaining unlabeled raw data, and then manually labeling these raw data one by one. The data volume of raw data is large and consumes more manpower.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Annotation data acquisition method and device, and equipment
  • Annotation data acquisition method and device, and equipment
  • Annotation data acquisition method and device, and equipment

Examples

Experimental program
Comparison scheme
Effect test

Embodiment approach

[0114] S201: Predefine regular expressions:

[0115] [] One of the contents can be selected or empty, and different contents are separated by "|" () One of the contents must be selected and cannot be empty. Different contents are separated by "|"

Representative content needs to be adjusted according to actual needs {} The content is composed of multiple replaceable variables, and the composition methods are quite diverse

[0116] S202: The user defines a domain type and an intent type.

[0117] Assume the user-defined domain type is "navigation".

[0118] In some cases, the step "User Defined Intent Type" can be omitted. Alternatively, the user can also define the intent type as "other", which means an intent with no specific meaning.

[0119] S203: the user inputs text data in a regular expression format:

[0120] Assume that the text data entered by the user is: [I want|we|departure|navigation] (go|go for a while) ; Amon...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The embodiment of the invention discloses an annotation data obtaining method and device, and equipment. The method comprises the following steps: obtaining text data in a regular expression format; combining the text data into language material data based on a regular expression, and obtaining any one or more of the following labels of the corpus data: a domain, an intention and a slot position;and based on the label and the corpus data, generating a data set including an annotation result. In the scheme, the text data conforms to the regular expression, for equipment, the text data can be combined into the language material data based on the regular expression, and the corresponding label is obtained, so that compared with manual annotation, more manpower is saved.

Description

technical field [0001] The present invention relates to the technical field of natural language, in particular to a method, device and equipment for acquiring tagged data. Background technique [0002] In some scenarios, the user can perform voice interaction with smart devices such as vehicle-mounted devices, smart home devices, or terminal devices such as mobile phones and computers. These smart devices perform speech recognition on the voice commands issued by users. During the speech recognition process, the speech data is converted into corpus data, and the corpus data is input into the trained recognition model for semantic analysis. [0003] Training the recognition model requires a large amount of labeled text data. For example, text data can be labeled based on domain, intent, and slot. Domain refers to the same type of data or resources, as well as the services provided around these data or resources, such as "restaurant", "hotel", "airplane ticket", "train ticket...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F40/117G06F40/30
Inventor 杜京钢
Owner 大众问问(北京)信息科技有限公司