Character relationship extraction method oriented to headline

A technology of character relationship and relationship, applied in the information field, can solve problems such as inaccurate processing, poor domain migration, and poor classification effect, and achieve the effects of reducing feature dimensions, improving judgment efficiency, and improving migration

Active Publication Date: 2016-05-25
INST OF INFORMATION ENG CHINESE ACAD OF SCI
View PDF1 Cites 14 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The main problem with pattern matching is that most of the templates are manually formulated. In addition to consuming a lot of human resources, when the data scale is large, it is difficult to formulate a more comprehensive and accurate template set
In addition, when the domain changes, the original template will not necessarily still be applicable, often need to re-create the template, the domain migration is poor
[0005] Semantic analysis methods rely on the accuracy of word segmentation, part-of-speech tagging, and dependency analysis, and existing tools cannot accurately deal with the above problems
At the same time, the sentence structure of news headlines is relatively concise, and the sentence structure sometimes does not meet the general syntactic rules, which also affects the accuracy of semantic analysis
[0006] The problem with the feature classification method is that the feature dimension extracted from the entire corpus is often very high, resulting in low efficiency when using the classifier for training and testing; second, when the classification effect is not good, it is difficult to find specific examples that affect the classification effect , what can be done is to adjust the parameters of the classifier or modify the selection of features; third, when the feature distribution of the training data and the test data is very different, the classification effect is very poor, and it is difficult to construct a relatively complete training data set

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Character relationship extraction method oriented to headline
  • Character relationship extraction method oriented to headline
  • Character relationship extraction method oriented to headline

Examples

Experimental program
Comparison scheme
Effect test

example

[0051] Example of the present invention: CCML2015 machine learning competition task 1

[0052] Determine whether the given S (Subject, subject) P (Predicate, predicate, here refers to the character relationship) O (Object, object) is correct based on the news headline. There are 19 types of character relationships (P), including: the same school girl, the former Rival, teacher, shirtless, ex-girlfriend, idol, ambiguous, rumored girlfriend, rumored discord, ex-wife, girlfriend, wife, friend, breakup, copycat, classmate, agent, fellow villager, cohabitation. At the same time, the competition task provides some attributes of the person (S / O) (name, gender, ethnicity, height, weight, occupation, birthplace, native place, date of birth, date of death, alias, etc. 11 types).

[0053] (1) Firstly, the data is cleaned, and the following heuristic rules are formulated to directly determine news headlines that do not meet the conditions. The heuristic rules are as follows:

[0054] ① ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a character relationship extraction method oriented to a headline. The character relationship extraction method comprises the following steps: 1) looking for a relationship designator in the headline for distinguishing the character relationships of different categories; 2) according to the position characteristics of a character and the relationship designator in the headline, establishing a sentence pattern template used for describing a sentence, utilizing training data to carry out statistics on the positive / negative example number of each template, and judging the correctness of the relationship among characters in the headline according to the ratio of positive templates and negative templates; and 3) extracting characteristics from a knowledge base of headlines and character attributes, combining with the positive / negative example number, which is obtained in the S2), of the sentence pattern template to judge whether the given character relationship is correct or not through a characteristic classification method. The character relationship extraction method lowers characteristic dimension and improves judgment efficiency while accuracy is guaranteed. The character relationship extraction method can be used for mining the character relationship in the headline so as to find central figures, hot issues and the like in the society and is convenient in mastering society dynamic conditions and monitoring public sentiments.

Description

technical field [0001] The invention belongs to the field of information technology, and in particular relates to a method for extracting character relations facing news titles. Background technique [0002] Character relationship extraction is an important branch of entity relationship extraction. Entity relationship refers to the semantic connection that exists between entities. The AutomaticContentExtraction (ACE) conference defines entity relationship extraction as: according to the pre-given entity relationship type, it is determined whether there is a semantic relationship between entities or whether they belong to a given relationship type. Character relationship extraction limits the entities in the entity relationship extraction to people, and the relationship type is limited to the relationship between characters for extraction. At present, the main methods of character relationship extraction include: pattern matching, semantic analysis, feature classification, e...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27
CPCG06F40/258G06F40/289
Inventor 柳厅文亚静张浩亮时金桥赵佳鹏闫旸李全刚张洋
Owner INST OF INFORMATION ENG CHINESE ACAD OF SCI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products