System and methods for carrying out semantic classification on unknown words

A technology of unknown words and semantics, applied in the field of natural language processing, can solve problems such as unobtainable performance and reduced classification accuracy, and achieve the effects of saving time, large coverage, and improving work efficiency

Inactive Publication Date: 2010-08-04
NEC (CHINA) CO LTD
View PDF0 Cites 18 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

If the idea of ​​the second-type rule is directly extended to two-character words by voting (voting) and other

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • System and methods for carrying out semantic classification on unknown words
  • System and methods for carrying out semantic classification on unknown words
  • System and methods for carrying out semantic classification on unknown words

Examples

Experimental program
Comparison scheme
Effect test

example 1

[0078] Example 1: Suppose an unknown word "muscle" is input. The similar word retrieval means 1202 can retrieve the similar word "skin and flesh" from the dictionary. Comparing its different parts "muscle" and "skin", C(muscle)=C(skin)=c. Since C (skin and flesh)=c is known, it can be determined that the semantic class to which the unknown word "muscle" belongs is also c, that is, C (muscle)=c.

example 2

[0079] Example 2: Assume that an unknown word "vegetable" is input. The similar word retrieval means 1202 can retrieve three similar words "green garlic", "green beans" and "green hemp" from the dictionary. And, these three similar words all satisfy following condition: C (vegetable)=C (garlic)=C (green garlic)=c1, C (vegetable)=C (bean)=C (green bean)=c1, C (vegetable) =C (hemp)=C (green hemp)=c2. That is, in this case, for the unknown word "green vegetables", there are two different classification results. As mentioned above, in this case, the result selection means 1204 can select one of the results by voting. Since the words green garlic and green beans both support the c1 category, and only the green ma word supports the c2 category, the unknown word "green vegetable" is classified into c1 by voting.

[0080] The three semantic classification methods for unknown words according to the first, second and third embodiments of the present invention have been described in d...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a system and methods for carrying out semantic classification on unknown words. According to a first method, similar word sets aiming at each word root in a dictionary are divided into a plurality of groups according to different semantic classes, and the semantic class of the group including the most similar words is recorded to be used as the classification of the unknown words. According to a second method, each single character included by each semantic class in the dictionary is examined. If a word obtained through expanding a character is a similar word (i.e. the semantic class is identical), the unknown word can be more assuredly to be classified to the identical semantic class. According to a third method, the similar word of the input unknown word is examined. If a different part of the similar word has the identical semantic class with the complete similar word, the unknown word can also be more assuredly to be classified to the identical semantic class. The semantic classification methods and the system according to the invention are applicable to words with any number of characters, so higher coverage rate and higher classification accuracy can be realized.

Description

technical field [0001] The present invention relates to natural language processing, and more particularly to systems and methods for semantically classifying unknown words. Background technique [0002] With the rapid development of computers and the Internet, a large amount of text information is generated. Due to the proliferation of these text information, users increasingly hope that the text information can be automatically processed to reduce manual participation. [0003] Typically, the user is able to obtain some dictionary describing words in advance. These dictionaries can be used to describe words by part of speech (eg, noun, verb, adjective, etc.), semantic class (eg, person, event, emotion, etc.), meaning, and example sentences. These dictionaries provide a lot of help for text processing. [0004] Words that do not appear in the dictionary are called "unknown words". Generally speaking, unknown words can be derived from some new words. In text analysis wo...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/27
Inventor 赵凯胡长建邱立坤
Owner NEC (CHINA) CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products