Unlock instant, AI-driven research and patent intelligence for your innovation.

Model training corpus expansion method and device and electronic equipment

A technology of model training and corpus, which is applied in the field of big data, can solve problems that affect the progress of model training and optimization, cannot meet the needs of fast iteration of the model, and low mining efficiency, so as to speed up the progress of training and optimization, meet the needs of rapid iteration, and improve The effect of mining efficiency

Pending Publication Date: 2021-03-30
CHINA PING AN LIFE INSURANCE CO LTD
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The embodiment of the present application provides a method, device, and electronic equipment for expanding model training corpus to solve the problem of low mining efficiency when traditional corpus mining methods mine sparsely distributed useful data from massive online logs, which affects model training and optimization progress, unable to meet the fast iteration requirements of the model

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Model training corpus expansion method and device and electronic equipment
  • Model training corpus expansion method and device and electronic equipment
  • Model training corpus expansion method and device and electronic equipment

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0027] In the following description, specific details such as specific system structures and technologies are presented for the purpose of illustration rather than limitation, so as to thoroughly understand the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments without these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.

[0028] It should be understood that when used in this specification and the appended claims, the term "comprising" indicates the presence of described features, integers, steps, operations, elements and / or components, but does not exclude one or more other features. , whole, step, operation, element, component and / or the presence or addition of a collection thereof.

[0029] It shou...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention is suitable for the technical field of big data, and provides a model training corpus expansion method and device, and electronic equipment, and the method comprises the steps: collecting standard corpus; encoding the standard corpus to obtain a standard corpus vector; performing matching in a pre-constructed index tree based on the standard corpus vector to obtain N sentence vectorsof which the similarity with the standard corpus vector is greater than a threshold value, wherein sentence vectors obtained based on collected sentence coding are stored in leaf nodes of the index tree; analyzing the N sentence vectors to obtain N corresponding utterance sentences; and screening a target statement conforming to a set condition from the N utterance statements as a supplementary corpus, and merging the supplementary corpus into the standard corpus. According to the scheme, the mining efficiency can be improved, the subsequent model training and optimization progress is accelerated, and the rapid iteration requirement of the model is met.

Description

technical field [0001] The present application belongs to the field of big data technology, and in particular relates to a method, device and electronic equipment for expanding model training corpus. Background technique [0002] In the field of NLP (Natural Language Processing, natural language processing), model training and continuous optimization are usually inseparable from the training of massive corpus. When training the model, it is necessary to select a rich and comprehensive training corpus to ensure the training effect of the model. [0003] The source of massive training corpus often comes from the mining of online logs. When mining useful data, the online logs faced are often massive, and the monthly data is at the level of 100 million. And the online log is unstructured data, and the data useful for a specific NLP task is very sparse in the log. [0004] At present, the common method in the industry is to first give a small amount of standard corpus, scan and...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/33G06F16/335G06F16/31G06N3/08
CPCG06F16/3338G06F16/335G06F16/322G06N3/08
Inventor 阮晓义
Owner CHINA PING AN LIFE INSURANCE CO LTD