Method utilizing Chinese online resources for supervising extraction of character relations remotely

A technology of character relationship and remote supervision, applied in special data processing applications, instruments, calculations, etc., can solve problems such as lack of coverage, lower accuracy rate, and insufficient relationship types to achieve the effect of ensuring accuracy

Active Publication Date: 2014-09-10
EAST CHINA NORMAL UNIV
View PDF2 Cites 18 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the semi-supervised learning method has the following shortcomings: 1) Continuous iteration is prone to semantic drift, thereby reducing the accuracy rate; 2) It is necessary to define the relationship types of characters in advance, which may lead to the definition of relationship types is not comprehensive enough, such as defined in previous methods None of the relationship types covers the infrequent relationships such as "hostile" and "neighborhood"
However, in the extraction of Chinese character relationships, the remote supervision method has not been applied for a long time, which is necessarily related to the lack of a large-scale available Chinese relationship knowledge base.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method utilizing Chinese online resources for supervising extraction of character relations remotely
  • Method utilizing Chinese online resources for supervising extraction of character relations remotely
  • Method utilizing Chinese online resources for supervising extraction of character relations remotely

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0027] The invention utilizes the Chinese online encyclopedia to construct a knowledge base, and realizes character relationship extraction in the raw corpus. In the following embodiments, the data in the interactive encyclopedia is selected to build a knowledge base, and the SogouC corpus released by Sogou Laboratory is selected as the raw corpus, and the present invention is further described in conjunction with the accompanying drawings.

[0028] refer to figure 1 , first automatically construct the structure as a triplet from the interactive encyclopedia knowledge base .

[0029] by name If keywords are submitted to the Hudong Baike search portal, the introduction page will contain structured character relationship data. For example, enter "XX" to get the characters related to XX and their corresponding relationship types, such as figure 2 shown. Select ten representatives of different fields as the seeds of name search, obtain the list of person relationships fo...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method utilizing Chinese online resources for supervising extraction of character relations remotely. According to the method, at first, an online encyclopedia website, formed through a semi-manual mode, on a web is utilized for automatically constructing a knowledge base so as to obtain accurate relation types comprehensive as much as possible, and examples of the character relations; then co-occurring names and context features are extracted from a text corpus, and the names and the relation examples in the knowledge base are matched to obtain name pair sets of the marked relations and name pair sets of unmarked relations; finally, a label propagation algorithm is introduced to achieve relation match of unmarked name pairs, so that extraction of the character relations is achieved. According to the method, the knowledge base of the character relations can be automatically constructed, and the richer and more accurate relation types are included; based on the knowledge base, the label propagation algorithm is introduced to supervise extraction of the character relations remotely, and therefore accuracy of results of the extracted relations can be ensured.

Description

technical field [0001] The technical fields involved in the present invention include webpage information crawling, text preprocessing, feature extraction, character pair similarity calculation, label propagation algorithm, etc., wherein text preprocessing includes technologies such as sentence segmentation, word segmentation, part-of-speech tagging, and name recognition. In general, the present invention is an effective extraction method for Chinese character relationships in the field of relationship extraction, which utilizes a large number of online resources and adopts a remote supervised learning method to extract character relationships. Background technique [0002] In natural language processing (NLP), information extraction is an important research field and has been widely used in practice. Information extraction refers to the extraction of structured information from natural texts to help people quickly find useful information from massive amounts of information....

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30G06F17/27
CPCG06F16/2458
Inventor 杨静潘云郝娟杨辰翌黄保荃
Owner EAST CHINA NORMAL UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products