Unlock instant, AI-driven research and patent intelligence for your innovation.

A Chinese-oriented pre-training method and system

A pre-training, Chinese technology, applied in the field of Chinese-oriented pre-training, can solve the problems of data scale, quality, field differences, difficult to fully utilize, insufficient to support pre-training models, etc.

Active Publication Date: 2020-07-14
深圳智能思创科技有限公司
View PDF8 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] (1) The large differences in Chinese and English language characteristics lead to unsatisfactory model effects
Most of the pre-training models are designed for English, and the language characteristics of English are more or less considered in terms of the network structure of the model, training methods, and downstream task application methods. Big difference, if it is directly transferred to the Chinese field, the effect is often not ideal
[0006] (2) The available Chinese unsupervised corpus has obvious classification phenomenon, and there are large differences in data size, quality, and field, and it is difficult to make full use of it
Generally speaking, relatively poor-quality corpora are large in scale, and high-quality corpora are mostly small in scale, which is not enough to support the training of huge pre-training models. If these corpora with large differences in quality are treated equally, they can be mixed for training. model, relatively poor-quality corpus may dilute the effect of high-quality corpus

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Chinese-oriented pre-training method and system
  • A Chinese-oriented pre-training method and system
  • A Chinese-oriented pre-training method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0040] The technical solution of the present invention will be further described below in conjunction with the accompanying drawings.

[0041] The present invention is a Chinese-oriented pre-training system and method. The Chinese-oriented pre-training system is used to realize the Chinese-oriented pre-training system. The Chinese-oriented pre-training system includes a model parameter configuration module, a pre-training model generation module and Service encapsulation module; where:

[0042] The model parameter configuration module is used to display the user interface for users to customize the parameters of the Chinese pre-training model. The parameters configured include whether to introduce a model tuning method in the model, change the hyperparameters of the pre-training model, and specify three Different levels of unsupervised pre-training corpus, the unsupervised pre-training corpus includes large-scale general corpus, high-quality general corpus and domain-specific ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a Chinese-oriented pre-training method and system, comprising: a model parameter configuration module, a pre-training model generation module and a service packaging module. Model parameter configuration module: mainly aimed at the situation where users need to customize the Chinese pre-training model according to their own needs, so that they can configure the parameters of the pre-training model in a friendly interface; pre-training model generation module: configure and pre-train models according to the model parameters submitted by users Training corpus data, training a Chinese pre-training model, and saving it as a model file; service packaging module: packaging the model file into a Chinese feature extractor service, and providing users with corresponding Docker images to facilitate service deployment. The invention specifically aims at large-scale unsupervised Chinese corpus, and proposes a Chinese-oriented pre-training method and system, which effectively improves the performance of the pre-training method on Chinese tasks.

Description

technical field [0001] The invention relates to a Chinese-oriented pre-training method and system, belonging to the technical field of natural language processing. Background technique [0002] Generally speaking, most deep learning-based natural language processing tasks can be divided into the following three modules: data processing, text representation, and task-specific models. Among them, the data processing module and the specific task model module need to be designed according to the specific tasks, while the text representation module can be used as a relatively general module. Therefore, pre-training a general text representation module to reuse text features is of great significance for text transfer learning. In recent years, with the rapid development of deep learning methods, important breakthroughs have been made in many aspects such as machine translation, machine reading comprehension, and named entity recognition in the field of natural language processing...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F40/30G06F40/289G06N3/04G06N3/08G06F9/455
CPCG06N3/084G06F9/45558G06N3/047G06N3/044G06N3/045
Inventor 李舟军刘俊杰肖武魁覃维陈小明范宇
Owner 深圳智能思创科技有限公司