Chinese short text entity identification and disambiguation method based on enhanced character vector
A technology of entity recognition and short text, applied in the field of neuro-linguistic programming, can solve problems such as difficult to extract useful semantic information
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment
[0117] The main steps of the first part of entity recognition are:
[0118] 1.1 Input the Chinese short text "Bitcoin attracts countless fans", and get the character sequence ['bi', 'special', 'coin', 'suction', 'fan', 'no', 'number'], the number of characters is 7 , using the Word2vec method for pre-training to obtain a 300-dimensional character vector;
[0119] 1.2 Input the short Chinese text described in 1.1 into the language model BERT pre-trained with large-scale corpus, and obtain a 768-dimensional character context vector;
[0120] 1.3 Cut the short Chinese text described in 1.1 into Bi-gram word sequences ['Bit', 'Bitcoin', 'Bi Suck', 'Suck Fan', 'Fen Wu', 'Countless'], and then use Word2vec's Method training to obtain 300-dimensional adjacent character vectors.
[0121] 1.4 Input the short Chinese text mentioned in 1.1, import the mentioned dictionary database into the jieba word segmentation tool and then perform word segmentation. The obtained word sequence is: [...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com