Unlock instant, AI-driven research and patent intelligence for your innovation.

A New Term Recognition Method

A recognition method and terminology technology, which is applied in the field of Chinese natural language processing and automatic recognition of new Chinese words, can solve problems such as easy to miss new and meaningful terms, and achieve good recognition performance, high precision, and high recall rate.

Active Publication Date: 2020-04-14
中科国力(镇江)智能技术有限公司
View PDF3 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] Problem 1: New term recognition accuracy problem
[0006] Problem 2: New term recognition breadth problem
Since there are many combinations of words, it is easy to miss new and meaningful terms in the automatic recognition of new terms

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A New Term Recognition Method
  • A New Term Recognition Method
  • A New Term Recognition Method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0044] In order to describe the present invention more clearly, define and explain following terms below:

[0045] (1) Word length: the length of the word. A Chinese word is made up of one or more Chinese characters, and the length of a word is equal to the number of characters contained in the word. A word with a word length of 1 is called a one-character word, a word with a word length of 2 is called a two-character word, a word with a word length of 3 is called a three-character word, and so on.

[0046] (2) Multi-character words: Words that are composed of Chinese characters with a word length of 3 or more and have certain meanings are called multi-character words, such as "the spirit of the Communist Party of China" and "positive energy". The former is a four-character word, and the latter is Three words.

[0047] (3) Dictionary: a list of words composed of a group of words, where the words can be single-character words (ie, the word length is 1), two-character words (i...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to an efficient system and a method for new term identification. The system comprises a text word sequence module A which performs word segmentation on each document which is input in a text library RCorpus, a new term identification module B which performs new term identification on word sequence of each word segmented document in a text library TCorpus, and a verification module C which verifies the identified new terms. The method comprises the following steps: the first step, the text word sequence module A performing word segmentation on each text which is input in the text library RCorpus, to form a text word sequence; the second step, the new term identification module B performing new term identification on word sequence of each word segmented text in the text library TCorpus; the third step, the verification module C identifying the identified new terms. The invention provides the system and the method for new term identification, and the system and the method are high in precision and recall rate. Identification precision of new terms is 93.8%.

Description

technical field [0001] The invention relates to the fields of Chinese natural language processing and automatic recognition of new Chinese words, in particular to a method for automatic recognition of new terms. Background technique [0002] With the rapid development of the Internet, various new terms emerge in an endless stream, which brings great difficulties to natural language processing applications, automatic application software (such as word segmentation systems), and dictionary collection work lights. [0003] Research on the identification of new terms has been carried out for many years. Existing methods fall into the following three categories. The first statistically based approach. For example, Kenneth Ward Church and Béatrice Daille et al. use mutual information (Mutual Information) to extract fixed combinations and collocations of words. They believe that frequently co-occurring adjacent character combinations are generally terms, and then use mutual infor...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F40/279G06F40/284
CPCG06F40/279G06F40/284
Inventor 符建辉王卫明曹阳
Owner 中科国力(镇江)智能技术有限公司