Event-based Chinese coreference corpus library establishment method

A construction method and corpus technology, applied in the field of event-based Chinese referential corpus construction, can solve the problem of no Chinese referential corpus, etc., and achieve the effect of less classification, improved performance, and clear structure

Active Publication Date: 2017-06-27
SHANGHAI UNIV
View PDF6 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Events involve many aspects of entities, called elements. Like static concepts in traditional texts, there are also a large number of references. At the same time, events themselves also have many references. F...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Event-based Chinese coreference corpus library establishment method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0044] see figure 1 , this event-based Chinese reference corpus construction method mainly includes the following steps:

[0045] (1) Select the CEC2.0 corpus as the basis for construction,

[0046] (2) Determine the target of referential labeling and labeling methods,

[0047] (3) Formulate corresponding labeling specifications according to specific reference targets,

[0048] (4) CEC2.0 corpus text preprocessing,

[0049] (5) Automatically label event elements and event references,

[0050] (6) Further optimize the labeling results through manual labeling,

[0051] (7) Set consistency check steps to ensure the quality of corpus annotation.

Embodiment 2

[0053] This embodiment is basically the same as Embodiment 1, and the special features are as follows:

[0054] The step (1) selects the CEC2.0 corpus as the basis for construction:

[0055] (1-1). Select CEC2.0 as the basic corpus for construction;

[0056] (1-2). Check the accuracy of event and event element annotation against the CEC2.0 corpus annotation specification;

[0057] (1-3). Supplement related annotations for incompletely annotated corpus, and correct incorrectly annotated corpus.

[0058] The step (2) determines the target and labeling method of referring to:

[0059] (2-1). The targets of referents are divided into two categories: the referents of event elements (object, environment and time) and the referents of events. The referents of event elements are divided into existing elements There are two kinds of referential labels for and default elements;

[0060] (2-2). In order to facilitate related processing by the computer, all types of re...

example 1

[0061] Example 1: Attribute labeling of object elements

[0062]

[0063] Shanghai Municipal Government Information Office

[0064] 15:45 on the 12th release

[0065] information

[0066]

[0067]

[0068] say

[0069]

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to an event-based Chinese coreference corpus library establishment method. The method mainly comprises the following steps of (1) selecting a CEC2.0 corpus library as a basis of establishment; (2) determining a target and an annotation mode of coreference annotation; (3) making a corresponding annotation specification according to a specific coreference target; (4) performing text preprocessing on CEC2.0 corpora; (5) automatically annotating event elements and event coreference; (6) further optimizing an annotation result through manual annotation; and (7) setting a consistency check step to ensure the quality of corpus annotation. According to the method, the defects of an existing coreference resolution corpus library are overcome; the method not only can cover all events in the corpus library but also is established based on Chinese syntactic analysis and semantic analysis, and conforms to the characteristics of Chinese; and the method also can perform consistence check on annotated corpora to ensure the quality of the corpus annotation.

Description

technical field [0001] The invention belongs to the field of natural language processing (Natural Language Processing), and relates to an event-based Chinese reference corpus construction method. Background technique [0002] Reference is a common linguistic phenomenon that occurs a lot in daily conversations and texts. Reference can make language expression concise and coherent, which is conducive to language communication and text writing. However, using a large number of references will increase the difficulty for computers to understand language and text. The main task of coreference resolution is to identify the same entity described by different expressions in the text. In the past, a large amount of research work was concentrated on non-event texts, and achieved certain results. With the rise of the concept of "event", more and more scholars have begun to conduct event-oriented research. Events are related to many elements and are knowledge representation units wi...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30G06F17/27
CPCG06F16/36G06F40/30
Inventor 张亚军刘宗田李强周文刘炜
Owner SHANGHAI UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products