A Chinese word segmentation method based on deep learning

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
A technology of deep learning and Chinese word segmentation, applied in instruments, biological neural network models, calculations, etc., can solve problems such as gradient disappearance, inability to handle long-distance historical memory, and recurrent neural network gradient explosion

Active Publication Date: 2022-07-26

NANJING UNIV OF POSTS & TELECOMM

View PDF0 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0005] Early Chinese word segmentation tasks based on deep learning used a simple feedback neural network to label each word in the training sequence. This method only obtains context information within a fixed window, and cannot learn the relationship between data and previous data well.

[0006] Recursive neural network can automatically learn more complex features by accumulating historical memory, making full use of context, but in practice, it is found that the recurrent neural network has the problem of gradient explosion and gradient disappearance, which makes it face the problem of not being able to perform well. Dealing with the problem of long-distance historical memory

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0252] A Chinese word segmentation method based on deep learning, comprising the following steps:

[0253] Step 1: Perform literal word frequency statistics on the large-scale corpus D. Based on the CBOW model and the HS training method, each word in the corpus D is initialized as a basic distributed font vector, and the obtained font vectors are indexed by index. Save to dictionary V.

[0254] Step 2: Convert the training corpus into a fixed-length vector sentence by sentence, and send it into the improved bidirectional LSTM model. By training the parameters in the bidirectional LSTM model, the character-level literal vector in the dictionary V is refined and updated to obtain A feature vector carrying contextual semantics and a vector containing word features.

[0255] Step 3: For each training sentence, when training word by word, use the idea of full segmentation to segment all candidate words ending with the current word within the maximum word length range, and fuse t...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a Chinese word segmentation method based on deep learning, which includes the following steps: mapping Chinese characters into literal vectors based on literal word frequencies; refining the literal vectors, and extracting feature vectors carrying contextual semantic information and character-carrying properties The feature vector of the feature; the character-level vector is effectively fused into a word-level distributed representation, and then the fused candidate word vector is sent to the deep learning model to calculate the sentence score, decoded by the method of beam search, and finally obtained by the sentence score. Select the appropriate word segmentation result. In this way, the word segmentation task is freed from tedious feature engineering, better system performance can be obtained by extracting richer feature information, and the complete segmentation history is used for modeling, with sequence-level word segmentation capabilities.

Description

technical field [0001] The invention relates to the technical field of natural language processing, in particular to a Chinese word segmentation method based on deep learning. Background technique [0002] In the current big data environment, with the rapid development of IoT data perception, data cloud computing, triple play, and mobile Internet, the amount of data, especially unstructured text, has grown exponentially, and the types of data are diverse and heterogeneous. characteristics such as fragmentation, information fragmentation and low value density. The rapid expansion of data has brought great challenges to the automatic processing of information. How to efficiently and accurately process massive texts and extract valuable information has become an important topic in Natural Language Processing (NLP). [0003] In the field of natural language processing, especially in Chinese natural language processing, word segmentation is an important benchmark task, and the p...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & AuthorityPatents(China)

IPC IPC(8): G06F40/289G06N3/02

CPCG06N3/02G06F40/289

Inventor王传栋史宇李智

OwnerNANJING UNIV OF POSTS & TELECOMM

A Chinese word segmentation method based on deep learning

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology