Unlock instant, AI-driven research and patent intelligence for your innovation.

A Chinese Noun Phrase Recognition Method Based on Simple Chinese Noun Phrases

A technology of noun phrases and recognition methods, applied in the fields of natural language processing and machine learning, can solve the problems of low recognition rate of structural complexity, and achieve the effect of reducing the semantic complexity and improving the recognition effect.

Inactive Publication Date: 2018-12-21
DALIAN UNIV OF TECH
View PDF8 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0007] The technical problem to be solved by the present invention is that when using machine learning methods to identify Chinese MNP, the recognition rate is too low due to the length of the phrase and the complexity of the semantics and structure.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Chinese Noun Phrase Recognition Method Based on Simple Chinese Noun Phrases
  • A Chinese Noun Phrase Recognition Method Based on Simple Chinese Noun Phrases
  • A Chinese Noun Phrase Recognition Method Based on Simple Chinese Noun Phrases

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0043] figure 1 Flowchart of the method for identifying the longest noun phrases based on Chinese simple noun phrases.

[0044] In the following, the present invention will be described in detail with a specific example, taking Chinese version 5.0 of the Penn State Treebank as the data set in conjunction with the accompanying drawings and the technical scheme.

[0045] 1. Data preprocessing of version 5.0 of the Chinese Penn Treebank

[0046] The corpus that the present invention selects is Chinese Penn State Treebank 5.0 version, and corpus is divided into test corpus and training corpus by 1:4, carries out word segmentation and part-of-speech labeling to corpus with Chinese lexical analysis tool ctbparser, to each word in corpus, Taxonomic labels for Chinese SNPs and Chinese MNPs were assigned using the IOB method, respectively. (Take the sentence "Ensuring the orderly development of Pudong" as an example):

[0047] word

part of speech

Classification labe...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention belongs to the natural language processing subfield of artificial intelligence, and provides a method for recognizing the longest Chinese noun phrase based on Chinese simple noun phrases. It includes the following steps: S1 data preprocessing; S2 selects the SVM method, trains the Chinese SNP recognition model, and recognizes the Chinese SNP; S3 uses the abbreviation replacement method to simplify the text, and obtains new training and test corpus; S4 processes the new In the corpus, the sample set is extracted again, and model training and recognition are carried out for the simplified Chinese MNP; S5 restores the corpus, and the restored Chinese MNP is the final recognition result of this method. The Chinese MNP recognition method of the present invention can reduce the adverse effects caused by factors such as excessive length, semantics and complex structure of the Chinese MNP in automatic recognition, and thus can effectively improve the recognition effect of the Chinese MNP.

Description

technical field [0001] The invention relates to the fields of natural language processing, machine learning and the like, and is a method suitable for recognizing the longest noun phrase in Chinese. Background technique [0002] With the continuous development of Internet technology, online economic and trade activities between countries in the world are becoming more and more frequent, and a large amount of text information is disseminated at an explosive speed on the Internet. The research on natural language processing and related aspects is imminent. Among them, the recognition of MNP (maximal noun phrase, the longest noun phrase) is a basic task in natural language processing. [0003] Natural language processing includes several levels of lexical analysis, syntactic analysis, semantic analysis and pragmatic analysis. At present, the lexical analysis technology is relatively mature, word segmentation and part-of-speech tagging have achieved high accuracy, but the resul...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/27G06K9/62
CPCG06F40/289G06F40/30G06F18/2411
Inventor 黄德根田雪
Owner DALIAN UNIV OF TECH