Incident location extraction method oriented to Chinese news texts

A news and text technology, applied in the fields of natural language processing, public opinion analysis, and text mining, it can solve problems such as the inability to identify the location of events

Inactive Publication Date: 2015-06-24
XIAN JIAOTONG UNIV CITY COLLEGE
View PDF4 Cites 13 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

But the patent can only recognize the place names in the text, not the event location

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Incident location extraction method oriented to Chinese news texts
  • Incident location extraction method oriented to Chinese news texts
  • Incident location extraction method oriented to Chinese news texts

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0036] The present invention is described in detail below in conjunction with accompanying drawing.

[0037] Referring to the accompanying drawings, the specific implementation of the method can be divided into three steps: candidate event location extraction, feature vector construction, and event location identification. The specific description is as follows:

[0038] Step 1: Candidate Event Location Extraction

[0039] a) First, use the ICTCLAS Chinese word segmentation tool to segment the Chinese news text T to generate a sequence S composed of binary groups T =(w 1 ,p 1 ), (w 2 ,p 2 ),..., (w i ,p i ),..., (w n ,p n ), where n represents the number of words to be segmented, n>0, w i Indicates the vocabulary segmented by ICTCLAS, p i means w i part of speech;

[0040] b) from S T Select in turn all satisfying p i = "ni", p i = "nl", p i ="ns" two-tuple of one of three cases, p i = "ni", p i = "nl", p i ="ns" respectively represent the corresponding w ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides an incident location extraction method oriented to Chinese news texts. According to the method, firstly, character segmentation is conducted on the Chinese news texts T through an ICTCLAS Chinese character segmentation tool, and characters with the property being organization names, location nouns and place names are selected to form a candidate incident location set; for each character in the candidate incident location set, a three-dimensional feature vector including the context feature, the position feature and the topology feature is established; finally, through the established three-dimensional feature vectors, a Random Forest classifier is adopted to conduct two-value classification on all the characters in the candidate incident location set according to the incident locations and the non-incident locations, and thus extraction of the incident locations is achieved. According to the method, multiple types of features in the news texts can be utilized comprehensively, the context features, the position features and the topology features are extracted to form the feature vectors, the Random Forest classifier is adopted to obtain the organization names, the location nouns and the place names form the segmented characters so as to recognize the incident locations; the places where news events occur can be further recognized based on place name identification.

Description

technical field [0001] The invention relates to the fields of text mining, natural language processing and public opinion analysis in computer science and technology, in particular to a method for extracting event locations oriented to Chinese news texts. Background technique [0002] In news texts, there are words or phrases such as organization names, location nouns, place names, etc., but they are not necessarily the places where the events occurred. For example, in the news text “On June 19, 2012, during the G20 summit held in Los Cabos, Mexico, Argentine President Cristina delivered a letter to British Prime Minister Cameron concerning the sovereignty of the Malvinas Islands”, there is The place names are "Mexico", "Los Cabos", and "Malvinas Islands", but "Malvinas Islands" is not the place where the incident occurred. How to identify the event location from the institution name, place noun, and place name is a difficult problem in event extraction. [0003] A patent ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27G06F17/30
Inventor 何绯娟孙霞缪相林
Owner XIAN JIAOTONG UNIV CITY COLLEGE
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products