Doc2vec-based similar entity mining method
An entity and similarity technology, applied in the field of similar document mining, achieves the effects of strong scalability, comprehensive vector representation, and strong portability
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Example Embodiment
[0037] Step 1: word2vec calculation
[0038] 1.1 Participle
[0039] For Chinese word2vec calculation, the corpus should be segmented first.
[0040] The current mainstream technology of Chinese word segmentation is: for the login words, it realizes efficient word graph scanning based on the prefix dictionary, generates a directed acyclic graph (DAG) composed of all possible word formations of Chinese characters in the sentence, and uses dynamic programming to find the maximum probability Path, find the largest segmentation combination based on word frequency; for unregistered words, use the HMM model based on the ability of Chinese characters to form words, and use the Viterbi algorithm to solve the model.
[0041] The existing more mature Chinese word segmentation tools include IKAnalyzer, PaodingAnalyzer, etc.
[0042] 1.2word2vec
[0043] Unsupervised learning of word embedding has achieved unprecedented success in many natural language processing tasks. The words (and possibly phr...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap