A knowledge map construction method based on trusted web resources

A technology of web resources and knowledge graphs, which is applied in the direction of network data indexing, network data retrieval, semantic tool creation, etc., can solve the problems of uncontrollable quality of web resources, and achieve the effect of improving accuracy, high quality, and comprehensive knowledge graphs

Active Publication Date: 2020-12-04
中国科学院电子学研究所苏州研究院
View PDF13 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, due to the credibility of Internet resources, the quality of crawled webpage resources cannot be controlled

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A knowledge map construction method based on trusted web resources
  • A knowledge map construction method based on trusted web resources

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0027] The following specific embodiments and the accompanying drawings, the present invention is further described embodiment.

[0028] like Figure 1-2 Shown, constructed based on the mapping knowledge trusted web resources, comprising the steps of:

[0029] Step 1: Initialize credible web resources.

[0030] Each Wikipedia page for a link in Resources, and records the corresponding Wikipedia topic pages that building (Wikipedia pages, themes, links) triples. Wikipedia provides a public dump file, Baidu Encyclopedia resource has a corresponding set of data, so the Wikipedia page can be obtained by way of parsing the dump file or crawlers. Each Wikipedia page will reference the Topics tab and the page corresponding to the page. For example, Baidu Encyclopedia is divided into the areas of knowledge character, nature, culture, sports, society, history, geography, science, entertainment, living 10 categories.

[0031] Step 2: Get training set topic model.

[0032] (1) The initializat...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a knowledge graph construction method based on credible webpage resources, and the method comprises the steps: obtaining a link in reference data of each encyclopedia page, andrecording a theme in the corresponding encyclopedia page; crawling a text in a linkedwebpage and the link in the webpage, and adding the link in the webpage into a queue to determine a training set of atheme model; training an LDA model according to the crawled webpage text and theme label; crawling the web pages in the queue, adding links in the web pages into the queue, directly outputting theweb pages as knowledge extraction when the crawled links of the web pages belong to one-hop links, otherwise, calculating topic distribution of documents of the web pages by using theLDA model and clustering; respectively calculating a TrustRank value of each webpage for each clustering cluster; selecting a webpage for credibility labeling, and training a knowledge source identification model in combination with features; and obtaining new to-be-identified webpages in batches, calculating features, and outputting the webpages for knowledge extraction when the webpages are identified as knowledge sources. The knowledge graph constructed by the method is more comprehensive and higher in quality.

Description

Technical field [0001] The present invention relates to a knowledge map construction method, particularly relates to a map based on knowledge of construction of trusted web resources. Background technique [0002] Some knowledge extraction system based on an online encyclopedia, from Wikipedia, infobox or text to extract knowledge Baidu Encyclopedia of building knowledge map, because the online encyclopedia of knowledge and reliable, high quality maps constructed knowledge. However, the online encyclopedia of resources only a small part of the Internet, new sources of limited knowledge acquisition mode, so build knowledge map is more one-sided. Some knowledge extraction systems, such as DeepDive and Knowledge Vault, in order to search the entire Internet pages crawled knowledge extraction, knowledge to build a large-scale map. However, because of the credibility problems of Internet resources, so we can not control the quality of web resources crawling. Therefore, access to credi...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/36G06F16/35G06F16/951
CPCG06F16/35G06F16/367G06F16/951
Inventor 宋晓兆张楚一胡岩峰付啟明陈尚邓竟成
Owner 中国科学院电子学研究所苏州研究院
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products