Doc2vec-based similar entity mining method
An entity and similarity technology, applied in the field of similar document mining, achieves the effects of strong scalability, comprehensive vector representation, and strong portability
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment Construction
[0037] Step 1: word2vec calculation
[0038] 1.1 participle
[0039] For Chinese word2vec calculation, the corpus should be segmented first.
[0040] The current mainstream technology of Chinese word segmentation is: for entry words, efficient word map scanning is realized based on the prefix dictionary, and a directed acyclic graph (DAG) composed of all possible word formations of Chinese characters in a sentence is generated, and dynamic programming is used to find the maximum probability Path, to find the maximum segmentation combination based on word frequency; for unregistered words, use the HMM model based on the ability of Chinese characters to form words, and use the Viterbi algorithm to solve the model.
[0041] Existing relatively mature Chinese word segmentation tools include IKAnalyzer, PaodingAnalyzer, etc.
[0042] 1.2 word2vec
[0043] Unsupervised learning of word embeddings has achieved unprecedented success in many natural language processing tasks. Words...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com