Chinese semantic library new word generation method
A semantic library and new word technology, applied in natural language data processing, special data processing applications, instruments, etc., can solve problems that consume a lot of time and manpower, and achieve the effect of saving time and manpower
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0019] Such as figure 1 Shown, a kind of generation method of new word of Chinese semantic base, described method is by setting up text block as corpus, by scanning text block, the word that appears adjacently forms set, if this set is not in the dictionary, counts the number of occurrences of this set , if the number of occurrences exceeds the threshold, the adjacently occurring word is identified as a new word and added to the semantic library.
[0020] The composition of the text block includes: a single-word set, a double-word set, and a set composed of a combination of a single-word set and a double-word set.
[0021] The statistics of adjacent words appearing in the text block are calculated according to the offset of each word in the text, by establishing the offset vector that each word appears, and then counting each offset vector, the statistics Count the number of occurrences of adjacent words.
[0022] The word collection is obtained from user logs and database f...
Embodiment 2
[0024] In Chinese, a single character can form a word, so you only need to use a single character as a basic word, assuming a certain character is Wn. All Chinese words are a set of W{W1,W2,W3,...,Wn}, containing n different Chinese characters.
[0025] Another double word set Y, a participle is Ym, Y{Y1,Y2,Y3,...,Ym}, where Ym={wi-wj,wj-wi}, where i, j are between 1 and n , where the symbol '-' represents a relationship, for example, 'beauty' and 'person', there are two combinations, namely 'beauty' and 'renmei', all meaningful words composed of two characters for Y.
[0026] Similarly, all two-character words and single-character words can be combined as N={Wi-Yj, Yj-Wi}, where i and j are between 1 and n, and the symbol '-' represents a connection Relationships such as: Love-Beauty and Beauty-Love.
[0027] Collect all the text as a text block, and then scan for the first time, record the offset of each word, and establish the offset vector of each word.
[0028] Then c...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com