Chinese word segmentation method by using character embedding based on word context and neural network
A neural network, Chinese word segmentation technology, applied in biological neural network model, semantic analysis, electrical digital data processing and other directions, can solve problems such as insufficient use of word information
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0145] First, the labeled data used in this example is the Chinese version of Binzhou Treebank CTB6.0, in which there are 23,401 sentences in the training set, 2,078 sentences in the development set, and 2,795 sentences in the test set. The automatic segmentation data is a total of 41071242 sentences obtained in Chinese Gigaword (LDC2011T13).
[0146] In this embodiment, the complete process of using the Chinese word segmentation method based on word context-based word embedding and neural network in the present invention is as follows:
[0147] Step 1-1, determine the labeling system of the word labeling model, and define four types B, M, E, S, see 1-1 in the manual for specific meanings;
[0148] Step 1-2, train on Gigaword Chinese automatic segmentation data to get word embedding e uni Matrix and dword embedding e bi ;
[0149] Step 2-1, read a Chinese sentence "You will come right away", and calculate the score of each position about the mark:
[0150] 1. Your score(B)...
Embodiment 2
[0158] Algorithms used in the present invention are all written and implemented in C++ language. The model used in the experiment of this embodiment is: Intel(R) Core(TM) i7-4790K processor, the main frequency is 4.0GHz, and the memory is 24G. The labeled data used in this example is the Chinese version of Binzhou Treebank CTB6.0, in which there are 23,401 sentences in the training set, 2,078 sentences in the development set, and 2,795 sentences in the test set. The automatic segmentation data is a total of 41071242 sentences obtained in Chinese Gigaword (LDC2011T13). The model parameters are trained on Gigaword data and CTB6.0 data. The experimental results are shown in Table 1:
[0159] Table 1 Explanation of Experimental Results
[0160]
[0161]
[0162] Among them, Xu and Sun (2016) adopted a word segmentation model based on dependent recurrent neural network, Liu (2016) used a word segmentation model based on segmented representation, Zhang (2016) used a neural n...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com