Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Low resource domain word splitter training method based on transfer learning and word splitting method

A technology of transfer learning and training methods, applied in the field of natural language processing, can solve problems such as a large amount of labeled data, limited number of labeled data, and difficulty in achieving results, so as to achieve smooth learning, improved effects, and reduced conflicts

Inactive Publication Date: 2018-04-27
PEKING UNIV
View PDF6 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Although these neural network-based methods are very effective, training these models and getting a better result requires a large amount of labeled data
For many specialized fields, the amount of labeled data is very limited, making it difficult to achieve better results by using neural network-based methods for word segmentation.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Low resource domain word splitter training method based on transfer learning and word splitting method
  • Low resource domain word splitter training method based on transfer learning and word splitting method
  • Low resource domain word splitter training method based on transfer learning and word splitting method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0047] Below in conjunction with accompanying drawing, further describe the present invention through embodiment, but do not limit the scope of the present invention in any way.

[0048] The present invention provides a migration learning method for solving the problem of insufficient resources in word segmentation. By establishing a stacked neural network on the basis of different domain models, the word segmentation model is implemented using data from other resource-rich fields and data from a small amount of domain-specific resources. Training, thereby reducing conflicts in labeling and other aspects between different fields, and learning multi-field data more smoothly, thereby improving the effect of word segmentation in areas with insufficient resources. figure 1 It is a flow chart of the social network text word segmentation method provided by the present invention. The specific process is as follows:

[0049] 1) The input of the algorithm is to segment and mark corpus...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a low resource domain word splitter training method based on transfer learning and a word splitting method. The method comprises the steps of 1, conducting training in a targetdomain and all set domains respectively to generate corresponding word splitters; 2, using the word splitters of all the domains to conduct corpus word splitting processing on the target domain to obtain the implicit strata representation of each word xi on the corpus of the target domain of all the word splitters; 3, calculating the relevancy of the implicit strata representation of all the wordsplitters on the word xi and the implicit strata representation of the word splitters t of the target domain on the word xi, and obtaining weight vectors of the word splitters of all the domains on the word xi according to the relevancy; 4, weighing and summing the implicit strata representation of all the word splitters according to the weight vectors, obtaining the final implicit strata representation, and calculating labels of the word xi through the final implicit strata representation; 5, obtaining the word splitters of the target domain according to prediction labels and the standard result training of all words. By means of the low resource domain word splitter training method based on the transfer learning and the word splitting method, the word splitting effect of the low-resource domain corpus of the word splitters is greatly improved.

Description

technical field [0001] The invention belongs to the field of natural language processing, and relates to word segmentation of Chinese texts in a scene of insufficient resources, in particular to a training method and a word segmentation method for a word segmenter in a low-resource field based on transfer learning. Background technique [0002] For word segmentation tasks in the traditional news field, statistical methods have initially achieved good results, mainly including conditional random fields and perceptron models. However, these models need to extract a large number of features, so the generalization ability is limited. [0003] In recent years, more and more neural network-based methods have been used for automatic feature extraction, among which there are more word segmentation models, mainly including convolutional neural network (Convolutional Neural Network, CNN), long short-term memory neural network ( Long Short Term Memory Network, LSTM) etc. Although the...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/27G06K9/62G06N3/04
CPCG06F40/284G06N3/045G06F18/214
Inventor 孙栩许晶晶李炜马树铭
Owner PEKING UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products