Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Chinese sentence meaning structure model automatic labeling method based on CRF ++

A technology of semantic structure model and automatic labeling, which is applied in special data processing applications, instruments, electrical digital data processing, etc.

Inactive Publication Date: 2013-06-26
BEIJING INSTITUTE OF TECHNOLOGYGY
View PDF7 Cites 20 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, there is currently no method to allow computers to analyze the Chinese semantic structure of the original sentence, and there is no effective language feature extraction method for semantic analysis applications.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Chinese sentence meaning structure model automatic labeling method based on CRF ++
  • Chinese sentence meaning structure model automatic labeling method based on CRF ++
  • Chinese sentence meaning structure model automatic labeling method based on CRF ++

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0039] In order to better illustrate the purpose and advantages of the present invention, the implementation of the method of the present invention will be further described in detail below in conjunction with the accompanying drawings and examples.

[0040] Using the 10,000-sentence manual annotation corpus of the BFS-CTC Chinese annotation corpus as data, the test is carried out using the ten-fold cross method.

[0041] Step 1, in order to realize predicate recognition, word relationship recognition and semantic lattice type recognition, it is necessary to use CRF++ for training to obtain a model, which is used for recognition.

[0042] Step 1.1: Carry out model training for predicate recognition. The specific method is: according to the format of the CRF++ training data, first convert the corpus of the BFS-CTC Chinese annotation corpus into the CRF++ data format: the first column is the word sequence number, and the number starts from 1. Each word (including punctuation) ha...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a Chinese sentence meaning structure model automatic labeling method based on CRF ++, and belongs to the technical field of computer science and natural language processing semantic analysis. Firstly, linguistic data in a BFS-CTC Chinese labeling linguistic database are used, a predicate recognition model, a word relation recognition model and a semantic case type recognition model are obtained through training; secondly, recognition on an original sentence is conducted by using the predicate recognition model, the word relation recognition model and the semantic case type recognition model so that predicate information, work relation information and semantic case type information in the sentence are obtained; and finally, a Chinese sentence meaning structure model is obtained according to collocation rules of the predicate, the work relation and the semantic case types. The method is in the field of semantic analysis, more and comprehensive semantic features are provided, so that the foundation of sentence meaning structure model analysis to the sentence by a computer is laid. Meanwhile, a definite possibility is provided for automatic labeling of the BFS-CTC Chinese labeling linguistic database, both study and practical application of the linguistic data are important, and the method plays a great promoting role in augment of the BFS-CTC Chinese labeling linguistic database.

Description

technical field [0001] The invention relates to a CRF++-based automatic labeling method for a Chinese semantic structure model, which belongs to the technical field of semantic analysis of computer science and natural language processing. Background technique [0002] Modern linguistic theory divides the language system into three levels: phonetics, grammar, and semantics. On the premise that phonetics are not considered and morphology (including word form and part of speech) is separated from grammar, the analysis of different levels is to a certain extent. It is different, from lexical to grammatical to semantic analysis, it is a process from shallow to deep. The analysis of Chinese semantics is currently achieved by establishing corpus resources and using machine learning methods. The commonly used corpus resources are: [0003] 1. Chinese Proposition Bank (CPB) of the University of Pennsylvania [0004] CPB is based on the syntactic annotation corpus CTB (Chinese Tree ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/27G06F17/30
Inventor 罗森林韩磊潘丽敏魏超
Owner BEIJING INSTITUTE OF TECHNOLOGYGY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products