New word discovery method and device

A new word discovery and word technology, applied in the field of intelligent interaction, can solve the problems that the accuracy of new word discovery needs to be improved, and achieve the effect of improving update efficiency, reducing calculation amount, and reducing calculation amount

Active Publication Date: 2015-12-23
SHANGHAI XIAOI ROBOT TECH CO LTD
View PDF5 Cites 34 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The accuracy of new word discovery in the existing technology needs to be improved

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • New word discovery method and device
  • New word discovery method and device
  • New word discovery method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0057] The inventors have discovered through research that the existing new word discovery methods only judge the degree of combination of words in the candidate data strings, and use the candidate data strings with tighter combination of words in the candidate data strings as new words. However, words in some candidate data strings are more closely combined with outer words, which are not suitable as a new word itself. Therefore, if only the relationship between the words in the candidate data strings is judged, the result of finding new words is not accurate enough.

[0058] In the embodiment of the present invention, by calculating the information entropy of each word in the candidate data string and its outer words, and removing the candidate data strings whose information entropy of each word and its outer words is outside the preset range, it is possible to exclude the words found to be more suitable for Candidate data strings combined with external words can improve the...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

Provided is a new word discovery method and device. The method comprises the steps that pretreatment is conducted on received corpora, and text data are obtained; line division is conducted on text data, and statement data are obtained; word segmentation is conducted on the statement data according to individual words contained in a dictionary, and after word segmentation is conducted, word data are obtained; after word segmentation is conducted, combination is conducted on adjacent word data to generate candidate data strings; judgment processing is conducted on the candidate data strings to discover new words; judgment processing comprises the steps that information entropy of words in the candidate data strings and words outside the candidate data strings is calculated, and the candidate data strings of which the information entropy of the words and the outside words is out of the preset range are removed. By means of the new word discovery method and device, the accuracy of new word discovery can be enhanced.

Description

technical field [0001] The invention relates to the field of intelligent interaction, in particular to a new word discovery method and device. Background technique [0002] In many fields of Chinese information processing, it is necessary to complete corresponding functions based on dictionaries. For example, in an intelligent retrieval system or an intelligent dialogue system, through word segmentation, question retrieval, similarity matching, determination of retrieval results or intelligent dialogue answers, etc., each process is calculated by using words as the smallest unit, and the basis of calculation is Word dictionary, so the word dictionary has a great impact on the performance of the entire system. [0003] The progress and changes of social culture and the rapid development of economy and commerce often drive the change of language, and the most rapid manifestation of language change is the emergence of new words. Especially in a specific field, whether the wor...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/3334G06F16/3335G06F16/353
Inventor 张昊朱频频
Owner SHANGHAI XIAOI ROBOT TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products