Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A Domain Encyclopedia Construction System Based on Universal Encyclopedia Website

An encyclopedia and field technology, applied in the field of open knowledge extraction, can solve the problems of scattered knowledge, inability to build a large number of fields, and the high cost of manual construction of field encyclopedias, and achieve the effect of improving the accuracy of word segmentation

Inactive Publication Date: 2017-12-01
FUDAN UNIV
View PDF5 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] Aiming at the shortcomings of scattered encyclopedia data and knowledge, and manual construction of domain encyclopedias is too expensive and cannot be constructed in large quantities, the present invention proposes a domain encyclopedia construction system based on general encyclopedia websites

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Domain Encyclopedia Construction System Based on Universal Encyclopedia Website
  • A Domain Encyclopedia Construction System Based on Universal Encyclopedia Website
  • A Domain Encyclopedia Construction System Based on Universal Encyclopedia Website

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0099] Taking the establishment of the encyclopedia of Fudan University (that is, the entity entries in the encyclopedia are related to Fudan University) by using Baidu Encyclopedia data as an example, further describe the present invention. For the system module diagram, please refer to figure 1 . Each module of the system is used in turn for processing, as follows:

[0100] 1. Encyclopedia data crawling module

[0101] Use distributed web crawlers to crawl online encyclopedia data (all encyclopedia data are crawled here, not just crawling encyclopedia data for a certain field), and the crawled pages are the source code of web pages, see figure 2 Sample shown. It can be seen that the data in the original page is full of noise and must be preprocessed before use.

[0102] 2. Encyclopedia data preprocessing module

[0103] This module performs preprocessing such as denoising on crawled online encyclopedia pages to make the data meet the requirements of use.

[0104] (1) ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention belongs to the technical field of open knowledge extraction, and specifically relates to a field encyclopedia construction system based on a general encyclopedia website. The system is divided into the following modules: encyclopedia data crawling module, encyclopedia data preprocessing module, related entity search and sorting module and entity clustering module. The beneficial effects of the present invention are: the construction of domain encyclopedias is currently mostly manual construction, which is time-consuming and labor-intensive, and it is impossible to find all related entities manually, so the coverage rate is low; while establishing domain encyclopedias based on the domain-related entities found in the present invention, It can greatly reduce the manpower of building encyclopedias in the field and greatly increase the coverage. At the same time, using the field encyclopedia constructed by the system of the present invention will greatly facilitate users to obtain knowledge in a specific field, save the cumbersome search and screening process, and change "users passively searching for information" into "the system actively provides information".

Description

technical field [0001] The invention relates to a field encyclopedia construction system based on a general encyclopedia website, which belongs to the technical field of open knowledge extraction. Background technique [0002] Nowadays, many online encyclopedia websites, such as Baidu Encyclopedia and Wikipedia, have emerged in recent years, which greatly facilitate users to obtain information. Users can search for the information they need through the built-in search engine. Generally speaking, when a user queries an entity, he is often interested in entities related to the entity, or the purpose of the search is directly all entities related to an entity, such as hoping to search for all people related to Fudan University . However, in the current encyclopedia websites, this purpose cannot be achieved. For example, if you search for all the characters related to Fudan University, you can only search for the characters appearing in the corresponding webpage of Fudan Unive...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/30G06K9/62
CPCG06F16/951G06F40/279G06F18/214
Inventor 覃华峥肖仰华汪卫
Owner FUDAN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products