Distributed semantic and sentence meaning characteristic fusion-based character relation extraction method

A person relationship, distributed technology, applied in semantic analysis, special data processing applications, natural language data processing, etc., can solve the problems of poor classification effect, difficult feature selection of machine learning algorithms, insufficient feature analysis, etc., to improve accuracy rate effect

Inactive Publication Date: 2017-03-08
BEIJING INSTITUTE OF TECHNOLOGYGY
View PDF3 Cites 52 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0019] Aiming at the problem of difficult feature selection of machine learning algorithms and insufficient feature analysis, resulting in poor classification results, the present invention proposes a character relationship extrac...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Distributed semantic and sentence meaning characteristic fusion-based character relation extraction method
  • Distributed semantic and sentence meaning characteristic fusion-based character relation extraction method
  • Distributed semantic and sentence meaning characteristic fusion-based character relation extraction method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0069] In order to better illustrate the purpose and advantages of the present invention, the implementation of the method of the present invention will be further described in detail below in conjunction with the accompanying drawings and examples.

[0070] The data source is the BFS popular character search corpus, including "Yao Ming", "Liu Xiang", "Jay Chou", "James", "Jackie Chan", "Kobe", "Nicholas Tse", and a total of 1540 texts in the labeled corpus, with at least two personal names There are 2389 sentences in the database and 10000 unmarked sentences. The description of the data source is shown in Table 1, and the number of person entities is obtained through manual statistics.

[0071] Table 1. Experimental data sources for character relationship extraction

[0072]

[0073] In order to verify the character relationship extraction method, three experiments were conducted:

[0074] (1) Parameter selection experiment: select the optimal combination of thresholds K...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a distributed semantic and sentence meaning characteristic fusion-based character relation extraction method, and belongs to the field of natural language processing. The method comprises the steps of firstly performing training in a small amount of marked corpora and a large amount of unmarked corpora by utilizing statistic word frequency features and a Bootstrapping algorithm to obtain a relational feature dictionary; secondly constructing a triple instance of a statement through an element distance optimization rule, and constructing a triple feature space by fusing distributed semantic information and semantic information; and finally performing true-false binary decision on a triple, and obtaining a character relation type by utilizing a confidence degree maximization rule. According to the method, automatic generation of the feature relation dictionary is realized; a conventional relational multi-class problem is converted into a triple true-false binary decision problem, so that a conventional machine learning classification algorithm is better adapted; and by utilizing the distributed semantic information, the accuracy of relational classification is improved.

Description

technical field [0001] The invention relates to a method for automatically extracting character relationships from Chinese texts or Chinese text sets, and belongs to the technical field of computer science and information extraction. Background technique [0002] Character relationship extraction is the accurate and rapid automatic extraction of character entities and the relationship between characters scattered in the text, which belongs to the research content of the field of information extraction. [0003] Information extraction technology (IE, Information Extraction) has to complete two major research tasks: entity recognition (EDR, Entity Detection and Recognition) and relational recognition (RDR, Relation Detection and Recognition). Among them, relationship recognition (also called "relationship extraction") is to extract existing relationships between entities from text, and the types of these relationships are predefined. Character relationship is a kind of entity...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/27
CPCG06F40/211G06F40/30
Inventor 罗森林焦龙龙潘丽敏郭佳吴舟婷陈倩柔
Owner BEIJING INSTITUTE OF TECHNOLOGYGY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products