Corpus processing method, related device and equipment

A processing method and corpus technology, applied in the field of artificial intelligence, can solve problems such as insufficient support for normal training of the model, poor generalization effect of the model, etc.

Pending Publication Date: 2021-12-21
TENCENT TECH (SHENZHEN) CO LTD
View PDF0 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] When the current semantic analysis platform processes the request sentence, it usually uses the intent classification model to analyze the sentence. The intent classification model needs to be trained through a certain amount of training corpus, but because the existing training corpus only A small amount of seed corpus can be provided, which is not enough to support the normal training of the model, or the trained model is too fit to the training data, resulting in poor model generalization

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Corpus processing method, related device and equipment
  • Corpus processing method, related device and equipment
  • Corpus processing method, related device and equipment

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0060] The embodiment of the present application provides a method of corpus processing, which is used for corpus mining of the corpus to be expanded to obtain a large number of candidate corpus, and further excavate target corpus that is semantically similar to the corpus to be expanded from a large number of candidate corpus, so as to The corpus to be expanded is sufficiently expanded to meet the demand for the quantity of corpus for model training.

[0061] The terms "first", "second", "third", "fourth", etc. (if any) in the description and claims of the present invention and the above drawings are used to distinguish similar objects, and not necessarily Used to describe a specific sequence or sequence. It is to be understood that the data so used are interchangeable under appropriate circumstances such that the embodiments of the invention described herein are, for example, capable of practice in sequences other than those illustrated or described herein. Furthermore, the...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The embodiment of the invention discloses a corpus processing method, a related device and equipment, which are used for sufficiently expanding corpora to be expanded so as to meet the requirement of model training for the number of the corpora. The method of the embodiment of the invention comprises the steps that: corpora to be expanded are obtained, K candidate corpora are obtained according to the corpora to be expanded, and the K candidate corpora and the corpora to be expanded are input into a semantic recognition model so as to obtain K semantic recognition results, wherein each semantic recognition result is a similarity score or a similarity classification, the similarity score represents the semantic similarity degree between the candidate corpus and the corpus to be expanded, and the similarity classification represents the semantic category between the candidate corpus and the corpus to be expanded; and if at least one semantic recognition result in the K semantic recognition results meets the corpus extraction condition, the candidate corpus corresponding to the at least one semantic recognition result is determined as the target corpus to obtain at least one target corpus belonging to the corpus to be expanded.

Description

technical field [0001] The embodiments of the present application relate to the technical field of artificial intelligence, and in particular to a method for processing corpus, related devices and equipment. Background technique [0002] With the popularization of artificial intelligence, more and more artificial intelligence technologies can bring convenience to people's life. For example, the user inputs some request sentences through the smart assistant, and the smart assistant analyzes and processes these request sentences and transmits the processing results. Give corresponding feedback to subsequent services, so as to complete an interaction process with user voice. [0003] When the current semantic analysis platform processes the request sentence, it usually uses the intent classification model to analyze the sentence. The intent classification model needs to be trained through a certain amount of training corpus, but because the existing training corpus only A smal...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/33G06F40/30G06F40/289
CPCG06F16/3344G06F40/30G06F40/289
Inventor 王明包恒耀
Owner TENCENT TECH (SHENZHEN) CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products