Event trigger word extraction method and system based on auto-encoder fusion document information

An autoencoder and event-triggered technology, applied in the Internet field, can solve problems such as incomplete trigger words, ambiguity of polysemy event types, and sparse data, so as to improve the extraction effect, avoid manual intervention, and improve overall performance.

Active Publication Date: 2019-08-16
INST OF COMPUTING TECH CHINESE ACAD OF SCI
View PDF12 Cites 43 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] However, in a document, it is difficult to distinguish the ambiguity of the event type of polysemous words only by the context information of a single sentence. For example, "leave" can mean leaving or resigning, and it needs to be judged with the help of the global context and the associated events in the document.
Therefore, it is necessary to introduce chapter-level features to constrain the global context information, but the traditional context features are not enough to represent the global information of the document, and the dependencies in the constructed features cannot effectively analyze the long-distance dependencies. Word vectors cannot specifically obtain the chapter information where the current word is located
On the other hand, due to the variety and complexity of event structures, the existing commonly used event annotation datasets are very small, such as the ACE2005 dataset in the general news forum field contains only 599 English documents, and the MLEE (Multi-level EventExtraction) in the biomedical field Only 262 documents are included in the dataset
When using the neural network model, the problem of data sparseness is likely to cause incomplete and inaccurate trigger words extracted by the model
There are methods that use a large amount of unlabeled text data to introduce external domain information by means of training word vectors, but word vectors only focus on the semantic level of words and cannot effectively capture context information at the sentence level and document level

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Event trigger word extraction method and system based on auto-encoder fusion document information
  • Event trigger word extraction method and system based on auto-encoder fusion document information
  • Event trigger word extraction method and system based on auto-encoder fusion document information

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0024] In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention. In addition, the technical features involved in the various embodiments of the present invention described below can be combined with each other as long as they do not constitute a conflict with each other.

[0025] In order to overcome the deficiencies in the prior art, the present invention provides an event trigger word extraction method based on autoencoder fusion document information. This method uses large-scale unlabeled free text data to pre-train a document-level autoencoder language model, so that the model can effectively learn the word order and semantic information of long texts, and lea...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to an event trigger word extraction method based on auto-encoder fusion document information, which comprises the following steps: generating a training set by using unlabeled free text corpora, and training a GRU model to construct an auto-encoder; pre-processing and labeling the training corpus, and extracting to-be-identified words; obtaining a document vector of the to-be-identified word in a document where the to-be-identified word is located by the auto-encoder, and taking the document vector as a global feature of the to-be-identified word; expressing word vectorsand entity types of the to-be-identified words in a distributed manner to serve as local features of the to-be-identified words; carrying out vector splicing on the global feature and the local feature to obtain context features of the to-be-identified word; inputting the context features to the Bi-GRU model to perform multi-classification to identify whether the to-be-identified word is an eventtrigger word and the corresponding event type of the to-be-identified word.

Description

technical field [0001] The invention belongs to the technical field of the Internet, and in particular relates to a method for extracting event-triggered words that can be used in knowledge graphs. Background technique [0002] The event extraction task aims to extract structured event information from unstructured free text, where an event is composed of event trigger words, event types, event arguments, and the roles of event elements. The trigger word is the most important feature word that can trigger the occurrence of an event and determine the type of event, and then defines different event participation elements for different event types. Therefore, the event extraction task mainly includes the extraction of event trigger words and the identification of event participating elements, and the trigger word extraction is the basic step, and its recognition performance directly affects the accuracy of the event extraction system. [0003] Most of the existing trigger word...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/62G06F16/36
CPCG06F16/36G06F18/24G06F18/214
Inventor 程学旗靳小龙席鹏弼郭嘉丰赵越
Owner INST OF COMPUTING TECH CHINESE ACAD OF SCI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products