Eureka AIR delivers breakthrough ideas for toughest innovation challenges, trusted by R&D personnel around the world.

Automatic Chinese habit recognition method combining habit dictionary and synonym replacement strategy

An automatic recognition and replacement strategy technology, applied in natural language data processing, instruments, electrical digital data processing, etc., can solve problems such as the impact of word segmentation accuracy, and achieve effective efficiency, simple thinking, and high efficiency

Inactive Publication Date: 2020-11-20
JINLING INST OF TECH
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The recognition of unregistered words and quantifiers has been done by researchers and has been applied to Chinese word segmentation, which has improved the accuracy of word segmentation. However, because of the complex and diverse structures of idioms, few researchers have conducted independent research, making word segmentation difficult. Accuracy is affected to some extent

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Automatic Chinese habit recognition method combining habit dictionary and synonym replacement strategy
  • Automatic Chinese habit recognition method combining habit dictionary and synonym replacement strategy
  • Automatic Chinese habit recognition method combining habit dictionary and synonym replacement strategy

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0029] A kind of Chinese idiom automatic recognition method of the present invention combines idiom dictionary and synonym replacement strategy, specifically comprises the following steps:

[0030] Step 1. Read the text to be processed sentence by sentence, and search for a match in the dictionary;

[0031] This step uses 2 dictionaries, which are the idiom dictionary and the idiom, proverb, and Xiehouyu dictionary. The general length of the idiom dictionary is 4 characters, while the text length of the idiom, proverb, and Xiehouyu dictionary is variable, some are longer, and some are shorter. These two dictionaries need to be inverted indexed in advance for easy search.

[0032] (1) Lookup in idiom dictionary

[0033] During the search process, the search window is set to 4, and the strategy of combining forward matching and reverse matching is adopted;

[0034] (2) Search for common sayings, proverbs and allegorical sayings

[0035] In the search process, the search windo...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses an automatic Chinese habit recognition method, belongs to the technical field of natural language processing, and aims to recognize idioms, common languages, proverb languagesand shall languages in Chinese. The basic idea of the method is based on the non-synthesizability of the exercises, if a similar word is used for replacing a certain word of the exercises, the mutualinformation value of the word is greatly different from the mutual information value which is not replaced before. The recognition is divided into two steps: firstly, searching in idiom, common language, proverb and shall, if the idiom, the common language, the proverb and the shall are found, determining that the idiom is habitual; otherwise, it cannot be judged that the word is not the habit, the second step is executed, the change of the mutual information value before and after replacement is calculated through a synonym replacement method, if the difference value of the change is larger than the threshold value, the word can be recognized as the habit. According to the method, the accuracy of Chinese word segmentation can be effectively improved, and technical support is provided forearlier-stage corpus processing of related research.

Description

technical field [0001] The invention belongs to the technical field of natural language processing, and in particular relates to a method for automatic recognition of Chinese idioms. Background technique [0002] In the application of artificial intelligence and natural language processing, Chinese word segmentation is the first step in corpus processing, and its accuracy directly affects the subsequent results. The important factors affecting the accuracy of Chinese word segmentation include the automatic recognition of unregistered words, numerals and idioms, etc. The recognition of unregistered words and quantifiers has been done by researchers and has been applied to Chinese word segmentation, which has improved the accuracy of word segmentation. However, because of the complex and diverse structures of idioms, few researchers have conducted independent research, making word segmentation difficult. Accuracy is affected to some extent. [0003] Chinese idioms include id...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F40/289G06F40/242G06F40/216
CPCG06F40/216G06F40/242G06F40/289
Inventor 梁颖红产昊鹏
Owner JINLING INST OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Eureka Blog
Learn More
PatSnap group products