A Domain Adaptive Chinese Word Segmentation Method Based on Deep Learning

A Chinese word segmentation and deep learning technology, which is applied in natural language data processing, instruments, biological neural network models, etc., can solve the problems of weak domain adaptability of Chinese word segmentation models, and achieve the effect of strong domain adaptability and low training cost

Active Publication Date: 2020-05-05
HANGZHOU DIANZI UNIV
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] Aiming at the problem of weak domain adaptability of the Chinese word segmentation model based on deep learning, the present invention provides a domain adaptive Chinese word segmentation method based on deep learning, which can enhance the domain adaptability of the model

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Domain Adaptive Chinese Word Segmentation Method Based on Deep Learning
  • A Domain Adaptive Chinese Word Segmentation Method Based on Deep Learning
  • A Domain Adaptive Chinese Word Segmentation Method Based on Deep Learning

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0046] The present invention will be further described below in conjunction with drawings and embodiments.

[0047] Such as Figure 1-4 As shown, a domain-adaptive Chinese word segmentation method based on deep learning, the specific implementation steps are as follows:

[0048] Step 1. Process the text sequence to obtain the output of the BERT model, the output of the dictionary module and the output of the language model. Such as image 3 As shown, the text sequence input image 3 The BERT Chinese pre-training model shown.

[0049] 1-1. Obtain the output of the BERT model:

[0050] Pass the text sequence into the BERT model. For a text sequence, input it into the BERT Chinese pre-training model to get the output of the BERT model.

[0051]

[0052] Among them, E i A word vector representing character i. is the forward hidden layer state representing the character i-1, Represents the backward hidden layer state of character i+1.

[0053] 1-2 Get the output of the ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a field adaptive Chinese word segmentation method based on deep learning. The present invention comprises the following steps: step 1, process the text sequence to obtain the output of the BERT model, the output of the dictionary module and the output of the language model; step 2, use a gate structure similar to the gated recurrent unit to process the BERT model, the dictionary module and the The output of the language model; step 3, using the softmax function to obtain the corresponding prediction probability of each character. The present invention integrates the dictionary and unlabeled set information of the target field into the BERT model, and this method greatly enhances the domain adaptability of the Chinese word segmentation model.

Description

technical field [0001] The invention relates to the technical field of natural language processing, in particular to a field-adaptive Chinese word segmentation method based on deep learning. Background technique [0002] In recent years, Chinese word segmentation models based on neural networks have made breakthroughs in word segmentation accuracy. However, this kind of word segmentation model still has the problem of weak domain adaptability. This problem is caused by the fact that the training set and the test set belong to different fields. Specifically, the test set contains many domain-related words that are not in the training set. Without additional resources, only improving the neural network structure cannot recognize missing words very well. Because dictionaries and unlabeled sets of target domains contain a lot of domain-related vocabulary, many researchers will use them as additional resources, combined with the BiLSTM model, to enhance the domain adaptability ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F40/289G06N3/04
CPCG06N3/045
Inventor 张旻黄涛姜明汤景凡吴俊磊
Owner HANGZHOU DIANZI UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products