Training corpus acquisition method and apparatus

A technology for training corpus and acquisition methods, applied in special data processing applications, instruments, unstructured text data retrieval, etc., can solve problems such as manual data extraction, omission, and slow model optimization.

Active Publication Date: 2016-01-06
BEIJING JINGDONG SHANGKE INFORMATION TECH CO LTD +1
View PDF3 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

All data in this method needs to be manually extracted, and only one problem can be found to solve one. Many wrong problems are missed due to human reasons, a

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Training corpus acquisition method and apparatus
  • Training corpus acquisition method and apparatus

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0018] Exemplary embodiments of the present invention are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present invention to facilitate understanding, and they should be regarded as exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

[0019] figure 1 is a schematic diagram of the basic steps of the training corpus acquisition method according to the embodiment of the present invention. Such as figure 1 As shown, the training corpus acquisition method may include the following steps S11 to S14.

[0020] Step S11: Obtain the first initial training corpus and the second initial training corpus.

[0021...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The present invention provides a training corpus acquisition method and apparatus, and has the advantages of being high in automation degree and quick in acquisition speed. The method comprises: obtaining a first initial training corpus and a second initial training corpus; performing prediction on an optional training statement by using a probabilistic classification model constructed according to the first initial training corpus, so as to obtain a first prediction result; performing prediction on the optional training statement by using a probabilistic classification model constructed according to the first initial training corpus and the second initial training corpus, so as to obtain a second prediction result; and comparing the first prediction result with the second prediction result; if classification information of the first prediction result is inconsistent with that of the second prediction result, or if the classification information of the first prediction result is consistent with that of the second prediction result and a prediction probability of the first prediction result is less than that of the second prediction result, using the optional training statement and the classification information of the second prediction result as a training corpus and outputting the training corpus.

Description

technical field [0001] The present invention relates to the technical field of natural language processing, in particular to a training corpus acquisition method and device. Background technique [0002] Intent recognition, that is, to recognize the intention of a behavior. For example, in a question-and-answer dialogue, each sentence of the questioner has a certain intention, and the respondent answers according to the intention of the other party. Intent recognition has a wide range of applications in scenarios such as search engines and chat robots. [0003] The existing intent recognition methods mainly obtain a batch of corpus, and manually mark the intent of each corpus to obtain training data. A probabilistic classification model is trained by combining the training data with a specific algorithm, and the resulting probabilistic classification model is used to identify the intent of the new corpus. Due to the small amount of initially manually annotated corpus, som...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
CPCG06F16/3344G06F16/35
Inventor 俞晓光
Owner BEIJING JINGDONG SHANGKE INFORMATION TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products