New word discovery method and device, terminal and server

A new word discovery and new word technology, applied in the field of natural language processing, can solve problems such as low efficiency, and achieve the effect of reducing workload and labor costs

Active Publication Date: 2017-05-31
SHANGHAI XIAOI ROBOT TECH CO LTD
View PDF9 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] However, in the rule-based method, how to establish a comprehensive and complete rule is an urgent problem to be solved; in the statistical method, the determination of the threshold is a difficult problem, which leads to the problem that the extracted new words are not new words, which leads to new word candidates. Words contain junk word strings (such as "doing housework", "this book", "time", etc.), which requires a lot of manual participation in new word filtering, and the efficiency is extremely low

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • New word discovery method and device, terminal and server
  • New word discovery method and device, terminal and server
  • New word discovery method and device, terminal and server

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0040] As mentioned in the background technology, in the rule-based method of the prior art, how to establish comprehensive and complete rules is an urgent problem to be solved; in the statistical method, the determination of the threshold is a difficult problem, resulting in the extracted new words are not new words As a result, the new word candidates contain garbage word strings (such as doing housework, this book, time, etc.), which requires a lot of manual participation in new word filtering, and the efficiency is extremely low.

[0041] In the embodiment of the present invention, on the basis of the new word discovery operation, the sub-parts of the new word candidates are obtained based on the word segmentation results of the original corpus, that is to say, the sub-parts of the new word candidate words are all existing vocabulary; The semantic similarity between the candidate word and its sub-part, when the semantic similarity is less than the set threshold, indicates t...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a new word discovery method and device, a terminal and a server. The method comprises the following steps: performing new word discovery operating on a raw corpus, so as to obtain a new word nominated word; segmenting the raw corpus, so as to obtain a first segmented result; separating the new word nominated word according to the first segmented result, so as to obtain a subdivision of the new word nominated word, which is contained in the first segmented result and comprises at least two words in the first segmented result; working out the semantic similarity between the new word nominated word and the subdivision thereof; determining the new word nominated word as a new word if the semantic similarity is smaller than a given threshold value. The technical scheme improves the new word extraction efficiency and accuracy.

Description

technical field [0001] The invention relates to the field of natural language processing, in particular to a new word discovery method, device, terminal and server. Background technique [0002] In the practical application in the field of natural language, some scenarios need to determine words with new specific meanings, that is, new words, such as new three boards, warning stocks, and funds of funds. Therefore, it is necessary to perform new word extraction operations on text or corpus. [0003] In the prior art, new word extraction is mainly based on statistics and rules. The rule-based method is usually based on the internal grammatical rules of new words or the prefix and suffix rules of new words, and discovers new words based on this criterion. Statistical-based methods generally look for statistics that describe the characteristics of new words. Commonly used statistics include word formation probability, mutual information, rigidity, etc.; and extract candidate w...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27
CPCG06F40/284G06F40/289
Inventor 谢瑜张昊朱频频
Owner SHANGHAI XIAOI ROBOT TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products