Multi-criterion Chinese word segmentation method based on local self-attention mechanism and segmentation tree
A Chinese word segmentation and attention technology, applied in neural learning methods, natural language data processing, instruments, etc., can solve problems such as unreasonable word combination and failure to use word segmentation criteria, and achieve the effect of improving accuracy and reducing impact
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment
[0093] If the sentence X="literary experts from all over the country walk out of the Great Hall of the People".
[0094] 1. First, obtain the unigram feature and Bigram feature of each character through word2vec, and combine it with the pre-defined position vector as the embedding layer.
[0095] 2. Pass the embedding layer to the self-attention network and get its output. The output of the self-attention network is decoded by crf, each character is labeled, and multiple labeling results are obtained. The result obtained is as figure 2 shown
[0096] 3. Combine his annotation results into a segmentation tree to generate multiple segmentation sequences. The generated split tree is as image 3 shown
[0097] 4. Input multiple segmentation sequences into the scoring system, and select the set of segmentation sequences with the highest score as output. scoring system such as Figure 4 shown.
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com