A Chinese Word Segmentation Method Using Word Context-Based Embedding and Neural Networks
A neural network and context technology, applied in biological neural network models, semantic analysis, electrical digital data processing, etc., can solve problems such as not being able to make full use of word information
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0145] First, the labeling data used in this embodiment is the Chinese version of Binzhou Treebank CTB6.0, including 23401 sentences in the training set, 2078 sentences in the development set, and 2795 sentences in the test set. The automatic segmentation data is obtained from Chinese Gigaword (LDC2011T13) with a total of 41,071,242 sentences.
[0146] The present embodiment utilizes the complete process of the Chinese word segmentation method utilizing word context-based word embedding and neural network in the present invention as follows:
[0147] Step 1-1, determine the labeling system of the word labeling model, and define four types B, M, E, S, the specific meanings are shown in 1-1 in the specification;
[0148] Step 1-2, then train on Gigaword Chinese automatic segmentation data to get word embedding e uni Matrix and dword embedding e bi ;
[0149] Step 2-1, read a Chinese sentence "you come right now", and calculate the score of each position on the mark:
[0150]...
Embodiment 2
[0158] The algorithms used in the present invention are all written and implemented in C++ language. The model used in the experiment of this embodiment is: Intel(R) Core(TM) i7-4790K processor, the main frequency is 4.0GHz, and the memory is 24G. The labeling data used in this embodiment is the Chinese version of Binzhou Treebank CTB6.0, including 23401 sentences in the training set, 2078 sentences in the development set, and 2795 sentences in the test set. The automatic segmentation data is obtained from Chinese Gigaword (LDC2011T13) with a total of 41,071,242 sentences. The model parameters are trained on Gigaword data and CTB6.0 data. The experimental results are shown in Table 1:
[0159] Table 1 Description of the experimental results
[0160]
[0161]
[0162] Among them, Xu and Sun (2016) used a word segmentation model based on dependency recurrent neural network, Liu (2016) was a word segmentation model using segmentation representation, Zhang (2016) was a tra...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com