Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Formalized scheme for constructing Chinese tree bank based on sentence-based grammar

A Chinese and tree bank technology, applied in semantic analysis, natural language data processing, instruments, etc., can solve problems such as inconsistency, high requirements, mechanical and cumbersome sentence-level segmentation, etc., to improve accuracy and efficiency, and promote innovation Effect

Inactive Publication Date: 2017-05-24
彭炜明 +4
View PDF2 Cites 16 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] Phrase structure grammar uses layer-by-layer dichotomy to analyze the sentence structure, resulting in too mechanical and cumbersome sentence segmentation; dependency grammar analyzes the sentence structure around the central word, which avoids the problems in phrase structure grammar to a certain extent, but The flat presentation form realized by using dependency arcs blurs the hierarchy of sentences; the two grammatical systems are inconsistent with people's cognition of sentence grammar to a certain extent, so the requirements for annotators are high, and it is easy to cause annotation problems. inconsistency

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Formalized scheme for constructing Chinese tree bank based on sentence-based grammar
  • Formalized scheme for constructing Chinese tree bank based on sentence-based grammar
  • Formalized scheme for constructing Chinese tree bank based on sentence-based grammar

Examples

Experimental program
Comparison scheme
Effect test

specific Embodiment

[0098] The embodiment of the present invention provides a formalized scheme for constructing a sentence-based Chinese tree bank, which is designed based on Li's grammatical system and its graph-analytic syntax as a prototype.

[0099] In this embodiment, the formalization scheme of the syntax system in the sentence-based Chinese treebank construction is:

[0100] According to the law of Chinese grammar teaching from shallow to deep, three types of basic sentence patterns are designed: "basic sentence patterns", "extended sentence patterns" and "complex sentence patterns". "Basic sentence pattern" refers to a sentence pattern that only contains three main components (subject + predicate + object), and the predicate is the core; Basic structure, but with additional components such as attributives, adverbials, complements, independent words, etc., or sentence patterns generated by structural expansion such as double objects, juxtaposition, and apposition; "complex sentence patter...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a formalized scheme for constructing a Chinese tree bank based on sentence-based grammar and relates to the field of corpus linguistics and natural language processing. According to the formalized scheme, research results on ''dynamic words'' in the linguistic circle are introduced in the design process with the sentence-based grammar in Chinese traditional teaching grammar being a prototype. By the adoption of the formalized scheme for constructing the Chinese tree bank, the accuracy and efficiency of tree bank construction can be improved beneficially, and meanwhile communication and fusion of the three fields of information processing, grammar study and teaching practice are promoted.

Description

technical field [0001] The invention relates to the fields of corpus linguistics and natural language processing, in particular to a formalization scheme for constructing a Chinese tree bank based on a sentence-based grammar. Background technique [0002] Treebank is a deep-processed corpus that annotates syntactic structure information based on a specific grammatical system, and is the product of relatively mature development of corpus linguistics and natural language processing technology. Phrase structure grammar and dependency grammar are dominant in the grammar system currently used in treebank construction. Phrase structure grammar is derived from Chomsky's formal grammar theory, in which context-free grammar is widely used in natural language processing. It reduces several "word / speech" nodes into phrases according to certain grammatical production rules, and further participates in In the next reduction, a sentence node analysis method is finally obtained. Dependen...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27
CPCG06F40/211G06F40/284G06F40/30
Inventor 彭炜明宋继华王宁宋天宝郭冬冬杨天心赵敏
Owner 彭炜明
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products