Method and system for mining Chinese event information

An event information and event technology, which is applied in the fields of instrumentation, computing, and electrical digital data processing, can solve the problems of time-consuming and energy-consuming, time-consuming and labor-intensive, and high event mining costs.

Active Publication Date: 2014-03-05
苏州大数据有限公司 +2
View PDF3 Cites 25 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] In the above two mining methods, it is necessary to read a large amount of document information to compile mining rules or mark a large number of training samples, and

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and system for mining Chinese event information
  • Method and system for mining Chinese event information
  • Method and system for mining Chinese event information

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0137] Embodiment 1 of the present invention discloses a Chinese event information mining method, please refer to figure 1 , the method includes:

[0138] S1: Analyze and process the sentences in each document of the original text to obtain a set of candidate templates of the original text, the set of candidate templates includes at least one candidate template, and the candidate templates include candidate event anchors of the sentences, A four-tuple consisting of the entity of the sentence, the syntactic path of the candidate event anchoring to the entity, and the dependency path of the candidate event anchoring to the entity.

[0139] It should be noted that the original text in this embodiment refers to the text for which event information needs to be mined, which is a collection of text documents without any tag information.

[0140] Among them, see figure 2 , step S1 specifically includes:

[0141] S11: Perform word segmentation, entity recognition, syntax analysis, ...

example 1

[0143] Example 1: Hezbollah attacked Israel's Qiba Farm with missiles and injured 3 Israeli soldiers.

[0144] Afterwards, an entity recognition tool is called to identify entities from each sentence in the word segmentation sentence set, and mark the recognized entities to obtain a set of entity-tagged sentences. The format of each entity annotation in the entity annotation sentence set is "entity / entity type". For example, the above example 1 is specifically as shown in example 2 after entity annotation:

example 2

[0145] Example 2: Hezbollah / ORG attacked Israel / GPE Chiba Farm / LOC with missiles / WEA and wounded 3 / NUM Israeli / GPE soldiers / PER.

[0146] Among them, the entity types represented by "ORG", "WEA", "GPE", "LOC", "NUM" and "PER" are organization, weaponry, political entity, location, quantity and person respectively. In addition, commonly used entity types include "TIME", "JOB", "FAC" and "VEH", which represent time, job position, place and means of transportation respectively.

[0147] Then, the syntactic analysis tool is used to perform syntactic analysis on each sentence in the entity-labeled sentence set to obtain the syntax tree of each sentence, and the syntax trees corresponding to each entity-labeled sentence in the original text form a syntax tree set.

[0148] Syntactic analysis specifically refers to the analysis of the grammatical function of words in a sentence.

[0149] After using the syntax analysis tool to analyze the syntax of the above example 2, as shown in ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method and system for mining Chinese event information. The method includes the steps of defining an event sample model based on the mining requirement, instantiating the event sample model to obtain a seed event, taking the seed event as the foundation of an event mined from an original text, conducting preprocessing on the original text to obtain a candidate template set, conducting information labeling on the seed event, processing the seed event on the basis of labeling information to obtain a seed template set, then, processing the candidate template set through an iterative method according to the grade scores of candidate templates and the semantic similarity between the candidate templates and seed templates so that an event mining anchor set can be obtained, and obtaining the event types corresponding to mining event anchors according to the lexical semantic similarity between each event mining anchor in the event mining anchor set and any seed event anchor in the seed event set. The aim of mining the Chinese event information from the original text can be achieved only by labeling a small amount of samples, namely, the seed events, and the mining cost is reduced.

Description

technical field [0001] The invention belongs to the technical field of Chinese information mining, and in particular relates to a Chinese event information mining method and system. Background technique [0002] Event mining is to dig out the factual information that users are interested in from the massive Internet text information, so as to provide the basis for subsequent analysis and decision-making. For example, mining events related to terrorist attacks (including attack events, death events, etc.) from the Internet can be used to analyze the security situation of various countries and regions. Therefore, research on event mining methods, especially on Chinese event mining methods It has important application value. [0003] The purpose of Chinese event mining is to mine the anchor of a specific event from the original text, and to judge the event type corresponding to the mined anchor. At present, Chinese event mining methods mainly include manual rule method and su...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
CPCG06F16/335
Inventor 李培峰周国栋朱巧明孔芳
Owner 苏州大数据有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products