Chinese text key information extraction method based on pre-trained language model

A language model and key information technology, applied in neural learning methods, biological neural network models, natural language data processing, etc., can solve problems such as the lack of boundary information of polysemous words, enrich semantic features, and solve the problem of polysemy. the effect of righteousness

Active Publication Date: 2020-07-24
NANJING UNIV
View PDF15 Cites 34 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] Purpose of the invention: Aiming at problems such as polysemy of a word and lack of word boundary information that cannot be solved in traditional methods, the present invention proposes a key information extraction method based on a pre-trained language model

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Chinese text key information extraction method based on pre-trained language model
  • Chinese text key information extraction method based on pre-trained language model
  • Chinese text key information extraction method based on pre-trained language model

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0024] Below in conjunction with specific embodiment, further illustrate the present invention, should be understood that these embodiments are only used to illustrate the present invention and are not intended to limit the scope of the present invention, after having read the present invention, those skilled in the art will understand various equivalent forms of the present invention All modifications fall within the scope defined by the appended claims of the present application.

[0025] The present invention mainly aims at extracting key text information in complex scenes, and presents a method based on a pre-trained language model. This method divides the information category to be extracted into two modules: one is the module of using rule matching; the other is the module of named entity recognition based on the deep learning model. This method can deeply integrate regular matching features and deep language model semantic features, thereby improving recognition accurac...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a Chinese text key information extraction method based on a pre-trained language model, which comprises the following specific steps of: (1) classifying key information to be extracted, easily concluding information categories forming rules, and extracting by using a regular matching method; and (2) extracting the named entities by using a sequence labeling model. (3) constructing the sequence labeling model by adopting a method of finely adjusting a pre-training language model, wherein firstly, a large-scale unlabeled text corpus is used for learning to obtain the pre-training language model, and word boundary features are introduced in a pre-training stage; (4) replacing the data content matched by using the rule with the corresponding rule template label so as tocomplete fusion of rule matching and the deep network; and (5) performing fine adjustment on the pre-trained language model according to the marked training data, and migrating the pre-trained language model to the sequence marking task of the named entity. According to the method, text context semantic features can be effectively extracted, and each information type can be effectively identifiedin a complex information type scene.

Description

technical field [0001] The invention relates to a method for extracting key information of Chinese text based on a pre-trained language model, and belongs to the technical field of natural language processing and recognition. Background technique [0002] Text key information extraction refers to the identification and extraction of key data types specified in the text according to specific business needs. It mainly includes the recognition of named entities (Named Entity) and the recognition of some specific types of digital strings and character strings. The recognition of named entities can be better solved by using the sequence annotation model based on deep learning, but it cannot solve the recognition requirements of other numeric strings and character strings at the same time. Because the number strings cannot carry effective semantic information, and various number strings will interfere with each other. [0003] Most of the existing Chinese named entity recognitio...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F40/295G06F40/30G06F40/211G06N3/04G06N3/08
CPCG06N3/088G06N3/045
Inventor 俞扬詹德川周志华李龙宇
Owner NANJING UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products