Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Chinese Lexical Analysis Method

A lexical analysis, Chinese technology, applied in the field of Chinese lexical analysis, can solve the problems of complex calculation, no named entity recognition included, and large amount of calculation.

Active Publication Date: 2016-11-30
ANYANG NORMAL UNIV
View PDF2 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

This method has the disadvantages of complex calculation and large amount of calculation, and does not include the recognition of named entities

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Chinese Lexical Analysis Method
  • Chinese Lexical Analysis Method
  • Chinese Lexical Analysis Method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0040] In order to enable those skilled in the art to better understand the solution of the present invention, the present invention will be further described in detail below in conjunction with the accompanying drawings and specific embodiments.

[0041] refer to figure 1 , the present invention is a kind of Chinese lexical analysis method, comprises the following steps:

[0042] Z1. Obtain feature functions and weights from a given training corpus:

[0043] Setting the size of the sample window, and selecting a feature template set, from a given training corpus according to the set sample window size, the context features are expanded through the feature template set, and each feature corresponds to a set of feature functions, Multiple sets of context features correspond to multiple sets of feature functions, and the weights of the multiple sets of feature functions are obtained, and multiple weights form a weight vector;

[0044] The key to feature selection is to select ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The present invention is a Chinese lexical analysis method, comprising the following steps: 1) obtaining feature functions and weights from a given training corpus; 2) dividing the input Chinese text: dividing the input Chinese text into multiple sentences , a sentence is a word sequence; 3) calculate the conditional probability of all possible lexical information mark sequences of the word sequence corresponding to the input Chinese text; 4) determine the final lexical information mark sequence of the word sequence corresponding to the input Chinese text; 5) Perform Chinese word segmentation, Chinese part-of-speech tagging and Chinese named entity recognition to obtain the final Chinese lexical analysis result. The invention unifies the three sub-tasks of Chinese lexical analysis into the word sequence labeling framework, overcomes the shortcomings of upward transmission, amplification and accumulation of errors, and difficulty in integrating and utilizing multiple types of information. The calculation is simple and the amount of computation is small; Login words can also be better segmented and labeled.

Description

technical field [0001] The invention relates to a Chinese lexical analysis method. Background technique [0002] In the field of Chinese information processing, Chinese lexical analysis is one of the important basic research topics. It is not only the basis of deep Chinese information processing such as syntactic analysis, semantic analysis, and text understanding, but also a key link in applications such as machine translation, question answering systems, information retrieval, and information extraction. Chinese lexical analysis mainly includes three sub-tasks of Chinese word segmentation, part-of-speech tagging and named entity recognition. In some related evaluations at home and abroad, they are often evaluated as three independent tasks. In existing research, most scholars are also accustomed to considering the three sub-tasks independently, especially accustomed to processing Chinese word segmentation and part-of-speech tagging in sequence, and then considering part-o...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/27
Inventor 于江德刘运通王希杰胡顺义郑霞葛彦强王继鹏
Owner ANYANG NORMAL UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products