A Knowledge Extraction Method Based on Online Encyclopedia Linked Entities

A knowledge extraction and entity technology, applied in special data processing applications, instruments, unstructured text data retrieval, etc., can solve the problems of low extraction efficiency and high error rate

Inactive Publication Date: 2017-01-18
FUDAN UNIV
View PDF5 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The present invention aims at the disadvantages of low extraction efficiency and high error rate of traditional knowledge, and considering that linked entities are marked out by users to distinguish them from other entities, based on the potential semantic relationship between linked entities and entries, an accurate and efficient method for Structured Knowledge Extraction Method Based on Linked Entities

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Knowledge Extraction Method Based on Online Encyclopedia Linked Entities
  • A Knowledge Extraction Method Based on Online Encyclopedia Linked Entities
  • A Knowledge Extraction Method Based on Online Encyclopedia Linked Entities

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0070] The present invention will be further described below in conjunction with the drawings and embodiments.

[0071] figure 1 It is a flowchart of the method of the present invention.

[0072] figure 2 For the right of sorting in evidence fusion The position ratio of different linked entities in the two rankings with In the distribution diagram below, obviously, if the entity is in the middle of the two sorts, then the two sorting weights are quite close to 0.5.

[0073] The present invention compares the effects of PMI, WJC and evidence fusion methods, such as Figure 4 . Figure 4 Represents the performance comparison of using different semantic similarity measurement methods for the linked entities of "Steve Jobs" and "Apple Inc.". Different semantic relevance ranking methods are closer to the manual annotation results, the better. Compared with PMI and WJC, the evidence fusion method in the figure is closer to the result of manual annotation.

[0074] The present inventio...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention belongs to the technical field of open knowledge extraction, and particularly relates to a knowledge extraction method based on online encyclopedia link entities. The knowledge extraction method comprises the steps that the irrelevant entities in the link entities are removed through an effective evidence fusion method to obtain the high-quality relevant link entities; then, the relevant link entities are clustered through a Gmeans clustering method, a descriptive class label is generated for each class through a class label generating method based on LCA, and an entity set and the class label corresponding to each class form a set of knowledge; finally, a class reusing mechanism based on a largest generation tree is used in order to increase the clustering efficiency of the large-data-amount entities, and therefore the clustering time is greatly saved. Compared with a traditional knowledge extraction method, the knowledge extraction method based on the online encyclopedia link entities has the advantages that the link entities based on the online encyclopedia instead of the content of tests are extracted, so that the defects that in a natural language processing method, the calculation cost is high and the error rate is high are greatly overcome, and therefore a large scale of data can be efficiently processed.

Description

Technical field [0001] The invention belongs to the technical field of open knowledge extraction, and specifically relates to a knowledge extraction method based on online encyclopedia link entities. Background technique [0002] Online encyclopedias, such as Wikipedia, are the most important open data resources on the Internet, providing the most authoritative and comprehensive knowledge source for knowledge acquisition. Online encyclopedia is the most valuable information in the era of data information, because part of the data in online encyclopedia is structured and can be understood by machines. Structured data allows users to directly understand knowledge and is widely used in search engines, question answering, etc. [0003] The representative of online encyclopedia structured data is Infobox (or attribute information table). However, the current Infobox table has some problems: First, the Infobox table is incomplete. Nearly 55% of entries in Wikipedia do not have Infobox...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/30
CPCG06F16/35
Inventor 张可尊肖仰华汪卫
Owner FUDAN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products