Word segmentation method supporting large number of word banks, and computer readable storage medium and system
A word segmentation method and word segmentation algorithm technology, which is applied in computing, instruments, electrical digital data processing, etc., can solve problems such as slow performance, not considering the whole, and not supporting a large number of lexicons, etc., to achieve the effect of increasing rationality and improving word segmentation efficiency
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0043] This embodiment proposes a word segmentation method that supports a large number of thesaurus, and its flow is as follows figure 1 shown, including the following steps:
[0044] Step 1: Build a domain dictionary, and establish a first-level index and a second-level index for each word in the domain dictionary whose length is greater than N; where the key of the first-level index is the first M words of each word, and the value of the first-level index is the length of the word; the key of the secondary index is the combination of M headers of each word and the length of the word, and the value of the secondary index is the hash mapping result of the word.
[0045] Specifically, the domain dictionary may be a domain dictionary of one domain, or multiple domain dictionaries of different domains, and each domain dictionary has an identifier indicating a corresponding domain.
[0046] In the domain dictionary, a primary index and a secondary index are also established for ...
Embodiment 2
[0095] This embodiment proposes a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, the word segmentation method is implemented.
Embodiment 3
[0097] This embodiment proposes a word segmentation system that supports a large number of lexicons for implementing the word segmentation method, and the word segmentation system can refer to figure 1 , including offline model unit, domain dictionary module, domain search module and word segmentation module, where:
[0098] The domain dictionary module stores pre-built domain dictionaries in different domains. Each word longer than N in the domain dictionary has a first-level index and a second-level index; this module opens the dictionary to users, allowing users to dynamically add new ones. Words, add custom words; this module also has a dictionary management function, users can manage dictionaries through this module, for example:
[0099] Users can mark words in fields to facilitate searching by fields;
[0100] Users can mark words according to the 4-tag method to facilitate offline training;
[0101] On the management page, users can make these annotations take effect...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com