A Chinese word segmentation method based on depth learning

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A deep learning and Chinese word segmentation technology, which is applied to instruments, biological neural network models, calculations, etc., can solve the problems of recurrent neural network gradient explosion, inability to handle long-distance historical memory, and inability to learn data association well.

Active Publication Date: 2018-12-25

NANJING UNIV OF POSTS & TELECOMM

View PDF1 Cites 32 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0005] Early Chinese word segmentation tasks based on deep learning used a simple feedback neural network to label each word in the training sequence. This method only obtains context information within a fixed window, and cannot learn the relationship between data and previous data well.

[0006] Recursive neural network can automatically learn more complex features by accumulating historical memory, making full use of context, but in practice, it is found that the recurrent neural network has the problem of gradient explosion and gradient disappearance, which makes it face the problem of not being able to perform well. Dealing with the problem of long-distance historical memory

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0252] A Chinese word segmentation method based on deep learning, comprising the steps of:

[0253] Step 1: Perform literal word frequency statistics on the large-scale corpus D. Based on the CBOW model and HS training method, initialize each word in the corpus D as a basic distributed literal vector, and index the acquired literal vector Save into dictionary V.

[0254] Step 2: Convert the training corpus into fixed-length vectors sentence by sentence, and send them into the improved bidirectional LSTM model. By training the parameters in the bidirectional LSTM model, refine and update the character-level literal vectors in the dictionary V to obtain A feature vector carrying contextual semantics and a vector containing literal features.

[0255] Step 3: For each training sentence, when training word by word, use the idea of full segmentation to segment all candidate words ending with the current word within the maximum word length, and fuse the refined character-level fea...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a Chinese word segmentation method based on depth learning, comprising the following steps: Chinese characters are maped into character vector based on literal character frequency; the character vector is refined to extract the feature vector with context semantic information and the feature vector with character feature; the character-level vectors are effectively fused with the word-level distributed representation, and then the fused candidate vectors are sent into the depth learning model to calculate the sentence scores, which are decoded by the cluster search method, and finally the appropriate word segmentation results are selected by the sentence scores. In this way, the task of word segmentation can be freed from the tedious feature engineering, better system performance can be obtained by extracting more abundant feature information, and the whole segmentation history can be used for modeling, which has the ability of word segmentation at the sequencelevel.

Description

technical field [0001] The invention relates to the technical field of natural language processing, in particular to a Chinese word segmentation method based on deep learning. Background technique [0002] In the current big data environment, with the rapid development of Internet of Things data perception, data cloud computing, triple play and mobile Internet, data, especially unstructured text data, is growing exponentially and presents types of diversification and heterogeneity. characteristics such as globalization, information fragmentation, and low value density. The rapid expansion of data has brought great challenges to the automatic processing of information. How to efficiently and accurately process massive texts and extract valuable information has become an important topic of Natural Language Processing (NLP). [0003] In the field of natural language processing, especially in Chinese natural language processing, word segmentation is an important benchmark task,...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06F17/27G06N3/02

CPCG06N3/02G06F40/289

Inventor 王传栋史宇李智

Owner NANJING UNIV OF POSTS & TELECOMM

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

A Chinese word segmentation method based on depth learning

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology