Scientific and technological resource relation extraction method and device based on pre-training language model

A language model and relation extraction technology, applied in computing models, natural language data processing, semantic tool creation, etc., can solve the problems of large workload of training data, small data set size, time-consuming and labor-intensive, etc., to avoid polysemy problems, efficient The effect of language feature learning ability and strong relation extraction ability

Inactive Publication Date: 2021-06-18
BEIHANG UNIV
View PDF2 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Traditional machine learning models rely heavily on the quality of features, and feature engineering is time-consuming and laborious
The effect of the deep learning model is affected by t

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Scientific and technological resource relation extraction method and device based on pre-training language model
  • Scientific and technological resource relation extraction method and device based on pre-training language model
  • Scientific and technological resource relation extraction method and device based on pre-training language model

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0037] The technical content of the present invention will be further described in detail below in conjunction with the accompanying drawings and specific embodiments.

[0038] Such as figure 1 As shown, the method for extracting scientific and technological resource relations based on the pre-trained language model provided by the embodiment of the present invention includes the following steps:

[0039] Step S1. Input the scientific and technological resource sentences to be classified into the scientific and technological resource relationship extraction model trained based on the pre-trained language model.

[0040] The scientific and technological resource sentence to be classified refers to extracting at least one sentence from the paper information in any field, and after segmenting each sentence, according to the entity word set composed of the keyword of the paper and the domain lexicon of the input method, the entity Sentence after tagging (two entities). Wherein, ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a scientific and technological resource relation extraction method and device based on a pre-training language model. The method comprises the following steps: inputting science and technology resource statements to be subjected to relation classification into a science and technology resource relation extraction model trained based on a pre-training language model; outputting a result by the scientific and technological resource relation extraction model, the result being a relation category between the two entities in the scientific and technological resource statement. According to the pre-training language model based on massive unsupervised data training, more efficient language feature learning ability can be provided, and meanwhile, the polysemy problem caused by using external pre-training word vectors is avoided. Moreover, in combination with a metric learning method, a scientific and technological resource relationship extraction problem is combined with a small sample learning problem, so that a relatively strong relationship extraction capability is realized by learning a small amount of training data, and a good relationship classification effect can be achieved under the condition that annotation data is relatively deficient.

Description

technical field [0001] The invention relates to a method for extracting scientific and technological resource relations based on a pre-trained language model, and also relates to a corresponding scientific and technological resource relation extraction device, which belongs to the technical field of natural language processing. Background technique [0002] In the process of constructing the knowledge map of scientific and technological resources, it is necessary to extract useful knowledge from a large number of texts, mainly the entities in the text and the relationship between entities. Named entity recognition is a subtask of information extraction, which aims to identify entities with specific meanings from text data, usually including names of people, places, institutions, domain names, etc. Relation extraction is after identifying related entities in the text, the information extraction task also needs to extract the semantic relationship between any two entities, for...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F40/279G06F16/33G06F16/36G06N20/00
CPCG06F40/279G06F16/367G06F16/3346G06N20/00
Inventor 张辉王本成葛胤池金盛豪王德庆
Owner BEIHANG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products