Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Chinese semantic matching method based on pinyin and BERT embedding

A semantic matching, Chinese technology, applied in semantic analysis, character and pattern recognition, text database clustering/classification, etc.

Pending Publication Date: 2020-07-14
HARBIN UNIV OF SCI & TECH
View PDF13 Cites 9 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

In order to solve the problem that the traditional embedding vector generation method cannot directly represent the ambiguity of elements, Yang Piao et al. used the BERT (Bidirectional Encoder Representations from Transformers) pre-trained language model to dynamically generate the embedding vectors corresponding to Chinese characters in the task of named entity recognition. And combined with two-way GRU and Conditional Random Fields (Conditional Random Fields, CRF) for named entity recognition

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Chinese semantic matching method based on pinyin and BERT embedding
  • Chinese semantic matching method based on pinyin and BERT embedding
  • Chinese semantic matching method based on pinyin and BERT embedding

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0030] Exemplary embodiments of the present invention will be described below with reference to the accompanying drawings. In the interest of clarity and conciseness, not all features of an actual implementation are described in this specification. It should be appreciated, however, that many implementation-specific decisions must be made in the course of developing any such practical embodiment in order to achieve the developer's specific goals, such as meeting those constraints related to the system and the business, and these Restrictions may vary from implementation to implementation. Furthermore, it should be understood that development work, while potentially complex and time-consuming, would be a routine undertaking for those skilled in the art having the benefit of the teachings herein.

[0031] Here, it should also be noted that, in order to avoid obscuring the present invention due to unnecessary details, only the device structure and / or processing steps closely rel...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a Chinese semantic matching method based on pinyin and BERT embedding. The method comprises the following steps: constructing a semantic matching model comprising a data preprocessing module, a BERT embedded layer module, a pooling layer module and a classifier module, and training the semantic matching model so as to carry out Chinese semantic matching on a to-be-matched statement by utilizing the trained semantic matching model; enabling the data preprocessing module to perform pinyin conversion and pinyin segmentation on each character in the two to-be-matched Chinesesentences to obtain a corresponding pinyin sequence; enabling the BERT embedded layer module to generate an embedded vector for each pinyin according to the context of the obtained pinyin sequence toobtain an embedded vector sequence; enabling the pooling layer module to aggregate the embedded vector sequence into a one-dimensional semantic representation vector for classification; and enablingthe classifier module to perform classification according to the one-dimensional semantic representation vector to obtain a prediction result corresponding to the semantic relationship between the twoChinese sentences. According to the method, the data volume required by pre-training can be greatly reduced, and a relatively good effect is ensured.

Description

technical field [0001] The present invention relates, in particular, to a Chinese semantic matching method based on pinyin and BERT embedding. Background technique [0002] The semantic matching task aims to model the semantics of two sentences and classify the relationship between them, which is the basis of many Natural Language Processing (NLP) tasks. For example, in natural language reasoning tasks, semantic matching is used to judge whether a certain hypothesis can be inferred from a certain premise; in question answering tasks and information retrieval tasks, semantic matching is used to calculate the correlation between the input sentence and each candidate answer And sort all candidate answers; in machine reading comprehension tasks, semantic matching is used to select the correct one among several options based on an article and corresponding questions. [0003] With the development of deep learning, the deep semantic matching model that automatically extracts feat...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F16/35G06K9/62G06N3/04G06F40/30
CPCG06F16/35G06N3/047G06N3/045G06F18/2411
Inventor 谢金宝战岭范衠黄书山林木深陈小威
Owner HARBIN UNIV OF SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products