Unlock instant, AI-driven research and patent intelligence for your innovation.

Word segmentation method based on common information and partial supervised learning of word segmentation tool

A technology of supervised learning and word segmentation, applied in neural learning methods, instruments, biological neural network models, etc., can solve problems such as insufficient field adaptability of Chinese word segmentation and labeling data, and achieve the effect of improving accuracy.

Active Publication Date: 2021-07-13
HANGZHOU DIANZI UNIV
View PDF5 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] Aiming at the problems of insufficient tagging data and domain adaptability of existing Chinese word segmentation, the present invention discloses a cross-domain Chinese word segmentation method that integrates common information of multiple word segmentation tools and partly supervised learning, and improves the accuracy of cross-domain word segmentation

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Word segmentation method based on common information and partial supervised learning of word segmentation tool
  • Word segmentation method based on common information and partial supervised learning of word segmentation tool
  • Word segmentation method based on common information and partial supervised learning of word segmentation tool

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0043] The present invention will be further described below with reference to the accompanying drawings.

[0044] Refer figure 1 with figure 2 , A word method based on a common information and partial supervision learning based on particle information. According to the following steps:

[0045] Step (1) Use a large number of non-label data and BilSTM neural networks to prepare a BILSTM module with a variety of word tools to obtain a well-trained BILSTM neural network module; the BILSTM neural network module is part of the initial word model.

[0046] Step (2) Use a small amount of labeling data to train the initial word model to obtain an initial word model M, a convolutional neural network and a variety of word tools. 0 .

[0047] Step (3) Using the initial word model M 0 Labeling a large number of non-label data sets to get a large amount of pseudo label data. Modify the initial word model M 0 Loss function, using a small amount of labeling data and a large number of pseudo-lab...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a word segmentation method based on common information and partial supervised learning of a word segmentation tool. The method comprises the following steps: (1) pre-training a BiLSTM module with multiple word segmentation tool common information by using a large amount of unlabeled data and a BiLSTM neural network to obtain a trained BiLSTM neural network module; (2) training the initial word segmentation model by using a small amount of labeled data to obtain an initial word segmentation model M0 based on the convolutional neural network and the common information of various word segmentation tools; and (3) labeling a large amount of unlabeled data sets by using M0 to obtain a large amount of pseudo-label data, modifying a loss function in the M0, and jointly training the M0 after the loss function is modified by using a small amount of annotation data and a large amount of pseudo-tag data to obtain a Chinese word segmentation model M1 based on generality information and partial supervised learning of various word segmentation tools; (4) iterating the step (3) for n times to obtain a final word segmentation model Mn. According to the invention, the accuracy of cross-domain Chinese word segmentation is improved.

Description

Technical field [0001] The present invention relates to Chinese chord tasks, specifically, a word-based method based on word-of-term tools, and partially supervised learning methods, belonging to the technical field of natural language processing. Background technique [0002] In recent years, the Chinese-speaking model based on neural network has achieved a very good effect on the accuracy of words. However, the existing Chinese chronic method and word tools often decline sharply in special fields, and cross-domain is difficult to become Chinese word. In response to the lack of Chinese chronic issues in the field of lack of marked data, a combination of symbolic tool commonherent information and partial supervision learning. It combines a small number of target domain labeling data directly, and a training in a large number of target areas, and has gradually optimized models through iterative training, and improves the field of designation of word models. The method performs exp...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F40/289G06N3/04G06N3/08
CPCG06F40/289G06N3/049G06N3/08G06N3/045
Inventor 张旻夏小勇姜明
Owner HANGZHOU DIANZI UNIV