Unlock instant, AI-driven research and patent intelligence for your innovation.

Knowledge graph data extraction method and device based on web crawler

A technology of knowledge graph and web crawler, which is applied in the field of knowledge graph data extraction based on web crawler, readable storage media and computing equipment, which can solve problems such as heavy workload, inconsistent web page format, unfavorable code maintenance, etc., to improve efficiency Effect

Pending Publication Date: 2021-05-14
厦门渊亭信息科技有限公司
View PDF5 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0002] With the rapid development of the network, the World Wide Web has become the carrier of a large amount of information. In the process of building the map, the data provided by the enterprise may not be able to meet the existing business: one is that the data is not comprehensive enough, and the other is that the data has a certain timeliness
It is a good choice to enrich database data by crawling data from open source websites. However, the webpage formats of current webpages are not uniform. Even the same webpage may contain different types of entities and relationships. For each Writing corresponding crawler codes to extract all kinds of data has the following disadvantages: First, the workload is heavy, and corresponding parsing logic needs to be written for each entity and relationship on each page; Adjustments may be made accordingly. When the structure of the web page changes, the code needs to be adjusted

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Knowledge graph data extraction method and device based on web crawler
  • Knowledge graph data extraction method and device based on web crawler
  • Knowledge graph data extraction method and device based on web crawler

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0038] Exemplary embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. Although exemplary embodiments of the present invention are shown in the drawings, it should be understood that the invention may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided for more thorough understanding of the present invention and to fully convey the scope of the present invention to those skilled in the art.

[0039] figure 1 is a block diagram of an example computing device 100 arranged to implement a web crawler-based knowledge graph data extraction method according to the present invention. In a basic configuration 102 , computing device 100 typically includes system memory 106 and one or more processors 104 . A memory bus 108 may be used for communication between the processor 104 and the system memory 106 .

[0040] Depending on the desired co...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The embodiment of the invention provides a knowledge graph data extraction method and device based on a web crawler, a readable storage medium and computing equipment, which are used for realizing crawler code reuse, deeply and automatically crawling webpage data in batches and avoiding the situation that a large number of webpage analysis codes need to be modified due to page changes. The method comprises the following steps: acquiring a target webpage for crawling data; configuring a crawling rule and an analysis rule of the target webpage; crawling the target webpage and a webpage linked with the target webpage according to the crawling rule; obtaining entity information and relation information contained in the target webpage and a webpage linked with the target webpage according to the analysis rule; and generating a knowledge graph according to the entity information and the relationship information.

Description

technical field [0001] The present invention relates to the technical field of artificial intelligence and automatic machine learning, in particular to a method, device, readable storage medium and computing device for extracting knowledge map data based on web crawlers. Background technique [0002] With the rapid development of the network, the World Wide Web has become the carrier of a large amount of information. In the process of building the map, the data provided by the enterprise may not be able to meet the existing business: one is that the data is not comprehensive enough, and the other is that the data has a certain timeliness. It is a good choice to enrich database data by crawling data from open source websites. However, the webpage formats of current webpages are not uniform. Even the same webpage may contain different types of entities and relationships. For each Writing corresponding crawler codes to extract all kinds of data has the following disadvantages: ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/951G06F16/36G06F40/205
CPCG06F16/951G06F16/367G06F40/205
Inventor 洪万福钱智毅吴文杰
Owner 厦门渊亭信息科技有限公司