Chinese lexical analysis method

A lexical analysis, Chinese technology, applied in the field of Chinese lexical analysis, can solve the problems of complex calculation, large amount of calculation, and no named entity recognition included in it.

Active Publication Date: 2013-12-25
ANYANG NORMAL UNIV
View PDF2 Cites 15 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

This method has the disadvantages of complex calculation and large amount of calculation, and does not include the recognition of named entities

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Chinese lexical analysis method
  • Chinese lexical analysis method
  • Chinese lexical analysis method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0040] In order to enable those skilled in the art to better understand the solution of the present invention, the present invention will be further described in detail below in conjunction with the accompanying drawings and specific embodiments.

[0041] refer to figure 1 , the present invention is a kind of Chinese lexical analysis method, comprises the following steps:

[0042] Z1. Obtain feature functions and weights from a given training corpus:

[0043] Setting the size of the sample window, and selecting a feature template set, from a given training corpus according to the set sample window size, the context features are expanded through the feature template set, and each feature corresponds to a set of feature functions, Multiple sets of context features correspond to multiple sets of feature functions, and the weights of the multiple sets of feature functions are obtained, and multiple weights form a weight vector;

[0044] The key to feature selection is to select ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a Chinese lexical analysis method. The Chinese lexical analysis method comprises the following steps of (1) obtaining a characteristic function and a weight from a given training corpus; (2) segmenting an input Chinese text: segmenting the input Chinese text into multiple statements, wherein one statement is a word sequence; (3) calculating a conditional probability of all possible lexical information tagging sequences of the word sequences corresponding to the input Chinese text; (4) determining a final lexical information tagging sequence of the word sequences corresponding to the input Chinese text; (5) carrying out Chinese word segmenting, Chinese POS (Part-of-Speech) tagging and Chinese named entity recognizing, and thus obtaining a final Chinese lexical analysis result. According to the Chinese lexical analysis method disclosed by the invention, three subtasks of Chinese lexical analysis are realized by being unified in a word sequence tagging framework, the defects that error is upwards transmitted, amplified and accumulated, and multiple classes of information are difficult to integrate and utilize are overcome, the calculation is simple, and the operation amount is small; a dictionary is not needed at all, and unknown words can also be better segmented and tagged.

Description

technical field [0001] The invention relates to a Chinese lexical analysis method. Background technique [0002] In the field of Chinese information processing, Chinese lexical analysis is one of the important basic research topics. It is not only the basis of deep Chinese information processing such as syntactic analysis, semantic analysis, and text understanding, but also a key link in applications such as machine translation, question answering systems, information retrieval, and information extraction. Chinese lexical analysis mainly includes three sub-tasks of Chinese word segmentation, part-of-speech tagging and named entity recognition. In some related evaluations at home and abroad, they are often evaluated as three independent tasks. In existing research, most scholars are also accustomed to considering the three sub-tasks independently, especially accustomed to processing Chinese word segmentation and part-of-speech tagging in sequence, and then considering part-o...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27
Inventor 于江德刘运通王希杰胡顺义郑霞葛彦强王继鹏
Owner ANYANG NORMAL UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products