Meaningful string identification method and device
A recognition method and algorithm technology, applied in the direction of instruments, computing, electrical digital data processing, etc., can solve the problem of low accuracy rate of meaningful string extraction, and achieve the effect of improving the probability of correct recognition
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
specific Embodiment approach
[0044] A specific implementation of the word segmentation algorithm is as follows:
[0045] Separate each sentence with sentence separator marks (such as period, exclamation mark, question mark, etc.), read a sentence, and get multiple possible candidate strings for this sentence. If there is a separator inside the candidate string, filter out the candidate string. Continue to read the next sentence and perform the above processing until all sentences are processed.
[0046] A specific implementation of the n-gram segmentation algorithm is as follows: read the parameters N1 and N2, where N1 is the minimum number of segmented words, N2 is the maximum number of segmented words, separated according to sentence separation marks (such as period, exclamation mark, question mark, etc.) Extract each sentence, and then extract each candidate string of n words from each sentence, where n is traversed from N1 to N2.
[0047] The separator string can be preset, including the characters in Table...
example 2
[0058] For example, the original corpus is: "Zuo Zhuan, Three Kingdoms... are all historical classics."
[0059] Still using the n-ary segmentation algorithm, the valid candidate strings extracted are: "Zuo Zhuan", "Three Kingdoms", "Guozhi", "Three Kingdoms", "Etc.", "Du", "Shili", "History" "," "all classics", "etc are all", "all calendars", "is history", "historical classics", "historical classics", "etc are all calendars", "all historical", "are historical classics" "And "historical classics".
[0060] Step 102: a statistical step, to perform statistics on the distribution of adjacent separation strings of each valid candidate string in the original corpus;
[0061] Optionally, a statistical result of the distribution of adjacent separated strings of a valid candidate string in the corpus (also referred to as a separated string score herein) refers to the total number of adjacent separated strings of all instances of the valid candidate string. Generally, in a corpus, a certain...
example 7
[0085] Example 7: Example of judging meaningful strings
[0086] Suppose that the effective candidate string and the corresponding left and right neighbor separator string scores are as follows:
[0087] Valid candidate string
Left Neighbor Separator String Score (L)
Right adjacent separator string score (R)
Three Kingdoms
17
0
Three Kingdoms
14
6
Zuo Zhuan
14
10
Etc.
1
0
[0088] Suppose the discriminant formula is:
[0089] It means that the condition of F(L, R)=1 is (L>5) and (R>5), the condition of F(L,R)=0 is (L≤5) or (R≤5), where F (L, R) is 1 means a meaningful string, F(L, R) is 0 means it is not a meaningful string, L means the score of the left-neighbor separation string, R means the score of the left-neighbor separation string, that is, satisfies the left and right adjacent separation strings The valid candidate strings with scores greater than 5 are meaningful strings, otherwise they are not meaningful strings.
[0090]
[0091] This shows that...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 


