Method and device for finding new word in text

A new word discovery, text technology, applied in the field of new word discovery in text, can solve the problems of long time, high computational cost, and a lot of cost, to reduce complexity, ensure effective storage, and save time.

Active Publication Date: 2015-11-25
BEIJING JINGDONG SHANGKE INFORMATION TECH CO LTD +1
View PDF5 Cites 8 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] However, the existing hidden Markov models, conditional random fields and other models still have certain defects in the process of discovering new words in the text: they all need to use artificial methods to discover the characteristics of words and words, and it takes a lot of time Time to observe a large amount of data to summarize
Therefore, the computational cost of using hidden Markov models, conditional random fields and other models in the prior art is high and time-consuming

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for finding new word in text
  • Method and device for finding new word in text
  • Method and device for finding new word in text

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0025] Exemplary embodiments of the present invention are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present invention to facilitate understanding, and they should be regarded as exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

[0026] figure 1 is a schematic diagram of main steps of a method for discovering new words in text according to an embodiment of the present invention.

[0027] Such as figure 1 As shown, the method for finding new words in the text of the embodiment of the present invention mainly includes the following steps:

[0028] Step S11: Separate each word in the text, and use ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a method and a device for finding new words in a text. The method and the device can automatically find characteristics of characters in a text, and find new words from the text through digging similarity of feature vectors of characters, so as to save time of observing data features in the prior art, and efficiency of finding new words is improved. The method comprises: separating each character in the text, using a deep neural network algorithm to extract the feature vector of each character; calculating included angle cosine value of the feature vector of each two adjacent characters and ranking calculation results; and selecting a sequence combination of two adjacent characters as a text new word, all included angle cosine values of the adjacent characters being larger than a preset threshold, and outputting the text new word.

Description

technical field [0001] The invention relates to the field of computer technology, in particular to a method and device for discovering new words in a text. Background technique [0002] With the continuous development of natural language processing technology in recent years, new word discovery has become more and more important (new word discovery in this paper refers to the discovery of words in the text, for subsequent word segmentation, tagging, subject extraction, etc. ready for operation). It can be said that words are the first and most important step in our natural language processing. Only when we already have words, can we perform subsequent operations such as word segmentation, tagging, and topic extraction on the text containing these words. In addition, with the rapid increase of new words on the Internet, new word discovery technology should not only discover words that do not yet exist, but also discover new words that continue to emerge every day. [0003]...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/28
Inventor 邵佳帅牟川邢志峰
Owner BEIJING JINGDONG SHANGKE INFORMATION TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products