A dynamic vocabulary enhancement combined model distillation method

A technology of vocabulary enhancement and distillation method, applied in computational models, instruments, electrical digital data processing and other directions, can solve the problems of model inference relying on high-configuration equipment, model accuracy decline, loss of semantic information, etc., to improve semantic understanding, The effect of fast inference and reduced model size
CN112699678AActive Publication Date: 2021-04-23达而观数据(成都)有限公司

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
达而观数据(成都)有限公司
Publication Date
2021-04-23

Smart Images

  • Figure 1
    Figure 1
Patent Text Reader

Abstract

The invention relates to the technical field of natural language processing in the field of artificial intelligence, and discloses a dynamic vocabulary enhancement combined model distillation method, which comprises the following steps: on the basis of an ALBert language model, adjusting the language model by combining a fine adjustment technology with a dynamic vocabulary enhancement technology to obtain a finely adjusted language model, and taking the finely adjusted language model as a teacher model; different from the conventional fine adjustment logic, when the language model is finely adjusted, in the fine adjustment process, combining the characteristics of the dictionary information with the output characteristics of the language model, and then performing fine adjustment; and after fine adjustment is finished, distilling the teacher model, and taking an obtained model prediction result as a training basis of the student model. According to the model distillation method provided by the invention, the dictionary information is introduced as the key information, so that the model can still capture the dictionary information as a feature under the condition of greatly reducing the size, thereby achieving the purposes of greatly reducing the size of the model and accelerating the inference speed under the condition of not sacrificing the extraction accuracy.
Need to check novelty before this filing date? Find Prior Art

Description

technical field

[0001] The invention relates to the technical field of natural language processing in the field of artificial intelligence, in particular to a model distillation method combined with dynamic vocabulary enhancement. Background technique

[0002] Text key information extraction is the most common task in the field of natural language processing. In recent years, since the emergence of Bert, models based on the Transformer mechanism have emerged in an endless stream. From Bert to RoBERTa, to XLNet, GPT-3 and other models, the accuracy of key information extraction tasks has been continuously refreshed. However, when NLP tasks are actually implemented, enterprises often use the technical architecture of high-concurrency model deployment considering factors such as cost and efficiency, and large-scale models in a multi-copy system mean that a large amount of GPU resources are occupied. What enterprises are often pursuing is not the highest accuracy rate, but the ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More