Problem semantic matching method for optimizing BERT

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A semantic matching and problem-solving technology, applied in semantic analysis, neural learning methods, natural language data processing, etc., can solve problems such as low quality of sentence vectors and difficulty in reflecting semantic similarity, and achieve fast results

Pending Publication Date: 2022-03-22

中国医学科学院医学信息研究所

View PDF0 Cites 1 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

Although the BERT-based model has achieved good performance in many NLP tasks, the quality of the sentence vector derived by itself is low, and it is difficult to reflect the semantic similarity between two sentences

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0029] Step 1: Data collection and data preprocessing;

[0030] Collect real medical dialogue records on the Internet, and store them in the local machine in the form of natural text.

[0031] Step 2: Carry out data segmentation, and divide the data into training set and verification set;

[0032] Remove expected non-compliant characters, redundant punctuation marks, unify the half-full-width representation of punctuation marks, use regular expressions to remove non-text corpus, perform word segmentation after data preprocessing, and construct word segmentation reference documents for full-width word mask unsupervised training;

[0033] Step 3: Based on the pre-training model Bert-wwm-ext, do unsupervised training of full word mask;

[0034] Use the following script for unsupervised training of the model, pre-training on the processed data set, so that the model can learn the characteristics of the data set

[0035] export TRAIN_FILE= / path / to / dataset / wiki.train.raw

[0036...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a semantic matching method based on Bert, and the method is based on a pre-training model Bert-wwm-ext of Harbin Institute of Word, the model is firstly used to carry out unsupervised training of full word masks under our big data background, so that the model is firstly adapted to our data characteristics, and after the model based on our data is stored, the model based on our data is subjected to unsupervised training of full word masks under our big data background. The following adjustments are made on the structure of the model, a Pooling layer is added to an output layer of Bert, when sentences are input, each Batch inputs a group of specific sentences, a part of the sentences are similar in semantics, the remaining sentences are different in semantics, and in this way, the model is made to be similar to human learning, and the sentences can be input into the Bert. Contrast learning between data is considered, so that the model converges more quickly, after model architecture transformation is completed, sentence semantic similarity training is conducted again under the background of large corpora based on the model, comparison calculation between synonymous sentences and non-synonymous sentences is added in the training process, then the model is subjected to back propagation, and therefore the sentence semantic similarity is obtained. And finally obtained sentence vector semantic representation is more practical.

Description

technical field [0001] The invention relates to a Bert-based semantic matching technology and belongs to the field of artificial intelligence. Background technique [0002] Judging from the current mainstream technologies for text matching problem solutions, they can be summarized into three categories: statistical learning, deep learning, and transfer learning. Statistical learning technology routes mostly obtain text features through manual / statistical methods, and then compare the similarity between text pairs. Typical methods include but are not limited to: [0003] (1) Evaluation of similarity based on string operations, such as edit distance; [0004] (2) count the number of terms, and directly use statistical indicators such as similarity coefficients to calculate the similarity between the two; [0005] (3) Obtain the vector of text information by encoding such as word frequency-inverse text frequency (TF-IDF), and then obtain text similarity through inner product ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06F40/211G06F40/30G06N3/04G06N3/08G06K9/62

CPCG06F40/30G06F40/211G06N3/084G06N3/088G06N3/045G06F18/22G06F18/214

Inventor 高东平秦奕杨渊李玲池慧

Owner 中国医学科学院医学信息研究所

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Problem semantic matching method for optimizing BERT

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology