Method and system for discovering new words in open fields

A new word discovery and open field technology, applied in the field of information intelligent processing, can solve problems such as difficulty and increase the difficulty of new word recognition, and achieve the effect of improving efficiency and improving the efficiency of recognition.

Inactive Publication Date: 2013-09-11
TSINGHUA UNIV
View PDF4 Cites 34 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Because the domain division and its possible number are uncertain, even if there is some comprehensive domain division rule, it is very difficult to judge that the new word to be discovered belongs to a certain domain, which undoubtedly increases the difficulty of new word recognition in the open domain. difficulty

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and system for discovering new words in open fields
  • Method and system for discovering new words in open fields
  • Method and system for discovering new words in open fields

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0036] Embodiments of the present invention are described in detail below, and examples of the embodiments are shown in the drawings, wherein the same or similar reference numerals denote the same or similar elements or elements having the same or similar functions throughout. The embodiments described below by referring to the figures are exemplary only for explaining the present invention and should not be construed as limiting the present invention.

[0037] In the description of the present invention, it should be understood that the terms "first" and "second" are used for description purposes only, and cannot be interpreted as indicating or implying relative importance or implicitly indicating the quantity of indicated technical features. Thus, a feature defined as "first" and "second" may explicitly or implicitly include one or more of these features. In the description of the present invention, "plurality" means two or more, unless otherwise specifically defined.

[00...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a method and a system for discovering new words in open fields. The method includes receiving corpora to be processed, conducting format conversion and word segmentation processing on the corpora to obtain a plurality of text messages, extracting characteristic messages of the plurality of text messages, judging whether combinations of adjacent text messages of a part of the text messages in the plurality of text messages are new words, conducting new word boundary labeling on the adjacent text messages on yes judgment, estimating parameters of a conditional random field model according to the labeled text messages and characteristic messages and identifying the surplus text messages according to the estimated conditional random field model to obtain new words of the surplus text messages. By means of the method, the new word boundary labeling is conducted on the text messages, the parameters of the conditional random field model are estimated, the plurality of text messages are identified to obtain the new words in the text messages, the new words in various fields can be identified, and meanwhile identification efficiency is improved.

Description

technical field [0001] The invention relates to the technical field of information intelligent processing, in particular to a method and system for discovering new words in an open field. Background technique [0002] Since Chinese is not like English and other Western languages, there is no fixed separator between words, so word segmentation is usually a necessary step at the beginning of Chinese information processing tasks. Existing studies have shown that words that are not included in the dictionary of the word segmentation tool encountered in the word segmentation task (that is, unregistered words, the new words referred to in this article are unregistered words) will significantly affect the performance of word segmentation. Therefore, the discovery of new words is of great significance for improving word segmentation and subsequent work. In addition, in recent years, the emergence of web2.0 applications such as personal blogs, personalized signatures, and microblogs...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27
Inventor 陈飞刘奕群马少平张敏金奕江张阔
Owner TSINGHUA UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products