Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Self-adaptive Chinese new word recognition method and system

A new word recognition and self-adaptive technology, applied in natural language data processing, instruments, electrical digital data processing, etc., can solve the problems of new word detection effect and reduce new word recognition

Active Publication Date: 2020-06-26
BEIJING FORESTRY UNIVERSITY
View PDF6 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

In addition, although the statistical characteristics of its definition can characterize the characteristics of new words to a certain extent, the setting of the threshold is also manually set based on experience.
The problem that is easily caused by this is: setting the threshold too high will reduce the recognition of new words, and setting the threshold too low will introduce a large number of irrelevant strings
Due to the correlation between new word detection and the training corpus, when the type of training corpus is inconsistent with the target text type, the effect of new word detection will be affected

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Self-adaptive Chinese new word recognition method and system
  • Self-adaptive Chinese new word recognition method and system
  • Self-adaptive Chinese new word recognition method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0061] The technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only part of the embodiments of the present invention, not all of them. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

[0062] According to one embodiment of the present invention, propose a kind of self-adaptive Chinese new word identification method, comprise the steps:

[0063] Document initialization step: For the Chinese text to be processed, its content includes Chinese characters, punctuation, format symbols, numbers, non-Chinese characters, and other symbols that may appear in the text. The function of document initialization is to perform structural process...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a self-adaptive new Chinese word recognition method and system. The method comprises the following steps: a text initialization step: carrying out the structural processing of an input text, and obtaining a wide character sequence which contains original Chinese characters and maintains the adjacent relation and interval relation of the Chinese characters in an original text; a non-accidental co-occurrence judgment step: using binomial distribution or Poisson distribution to approximately characterize probability distribution of front and back adjacent occurrence of a pair of Chinese characters, and determining all non-accidental co-occurrence adjacent Chinese character pairs in the text text text based on a given non-accidental co-occurrence significance level alphap; a statistical relevance judgment step: giving a relevance significance level alpha k, judging the relevance degree of adjacent Chinese characters in the text, and screening out Chinese character strings with internal strong relevance; and an existing word bank filtering step: based on an existing dictionary, screening out Chinese character strings which do not appear in the dictionary yet fromcharacter strings which meet non-accidental co-occurrence and are high in internal relevance.

Description

technical field [0001] The invention belongs to the field of Chinese language and text information processing, and relates to related technical fields such as Chinese neologism recognition, semantic analysis, automatic translation, information retrieval, and Chinese word segmentation, and in particular to a Chinese neologism recognition method and system based on accidental judgment and association judgment . Background technique [0002] With the development of the Internet and artificial intelligence technology, people's demand for semantic analysis, automatic translation, information extraction and retrieval and other applications continues to increase, all of which require Chinese words as the basic unit of processing. However, unlike languages ​​in Latin and Roman languages, Chinese does not use spaces as vocabulary separators. Therefore, when processing Chinese text, it is necessary to segment the text according to existing lexicons such as dictionaries. [0003] Howe...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F40/216G06F40/289G06F40/30
CPCY02D10/00
Inventor 蒋东辰唐帅蒋翱远牛颖
Owner BEIJING FORESTRY UNIVERSITY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products