Neologism discovering method and system

A new word discovery and new word technology, applied in the field of new word discovery method and system, can solve the problem of low quality of new words, achieve the effect of reducing manual workload, improving efficiency, and improving reliability

Active Publication Date: 2016-10-19
IFLYTEK CO LTD
View PDF4 Cites 17 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Although unsupervised learning does not require the support of a large number of training sets, the quality of t

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Neologism discovering method and system
  • Neologism discovering method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0068] In order to enable those skilled in the art to better understand the solutions of the embodiments of the present invention, the embodiments of the present invention will be further described in detail below in conjunction with the drawings and implementations.

[0069] Such as figure 1 Shown is the flow chart of the new word discovery method of the embodiment of the present invention, comprises the following steps:

[0070] Step 101, pre-training a new word discovery model based on the boundary features of word strings.

[0071] In the embodiment of the present invention, the existing system dictionary can be used to segment the training corpus, extract character string boundary features according to the word segmentation result, and then use the classification method to train the new word discovery model. The specific training process is as follows:

[0072] (1) Obtain training corpus.

[0073] The training corpus may be a large-scale corpus that includes a large nu...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a neologism discovering method and system; the method comprises the following steps: training a neologism discovering model in advance according to word string boundary characteristics, wherein the word string boundary characteristics refer to the word string statistics characteristic; obtaining novel linguistic data; dividing the novel linguistic data, and extracting boundary characteristics of the dividing result; identifying the boundary characteristics in classification according to the neologism discovering model, thus obtaining a neologism candidate set; determining confidence of the candidate neologism in the neologism candidate set, thus obtaining neologisms. The method and system can effectively discover linguistic data neologism while reducing artificial marking workload.

Description

technical field [0001] The invention relates to the field of information mining, in particular to a new word discovery method and system. Background technique [0002] The rapid development and popularization of informatization, electronics, and networking has brought about an explosive growth of information, and a large number of new words that do not exist in traditional dictionaries continue to emerge, including new words on the Internet and various proper nouns. New words on the Internet refer to words that have never appeared before and are created by users, and often have specific meanings, such as "to force", "can't afford to hurt", "brother overcoat" and so on. Proper nouns are also called named entities, including specific appellations such as person names, place names, and institution names. With the rapid increase of various new words, in order to continuously improve the convenience of human-computer interaction, it is obviously necessary to continuously track a...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
Inventor 汪洋陈志刚胡国平胡郁刘庆峰
Owner IFLYTEK CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products