Field encyclopedia establishment system based on general encyclopedia websites

An encyclopedia and domain technology, applied in the field of open knowledge extraction, can solve problems such as scattered knowledge, inability to build a large number of encyclopedias, and the high cost of manually building an encyclopedia in the field, and achieve the effect of improving the accuracy of word segmentation

Inactive Publication Date: 2015-03-11
FUDAN UNIV
View PDF5 Cites 29 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] Aiming at the shortcomings of scattered encyclopedia data and knowledge, and manual construction of domain encyclopedias is too expensive and cannot be constructed in large quantities, the present invention proposes a domain encyclopedia construction system based on general encyclopedia websites

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Field encyclopedia establishment system based on general encyclopedia websites
  • Field encyclopedia establishment system based on general encyclopedia websites
  • Field encyclopedia establishment system based on general encyclopedia websites

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0099] Taking the establishment of the encyclopedia of Fudan University (that is, the entity entries in the encyclopedia are related to Fudan University) by using Baidu Encyclopedia data as an example, further describe the present invention. For the system module diagram, please refer to figure 1 . Each module of the system is used in turn for processing, as follows:

[0100] 1. Encyclopedia data crawling module

[0101] Use distributed web crawlers to crawl online encyclopedia data (all encyclopedia data are crawled here, not just crawling encyclopedia data for a certain field), and the crawled pages are the source code of web pages, see figure 2 Sample shown. It can be seen that the data in the original page is full of noise and must be preprocessed before use.

[0102] 2. Encyclopedia data preprocessing module

[0103] This module performs preprocessing such as denoising on crawled online encyclopedia pages to make the data meet the requirements of use.

[0104] (1) ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention belongs to the technical field of open knowledge extraction and specifically relates to a field encyclopedia establishment system based on general encyclopedia websites. The system is divided into a plurality of modules, namely an encyclopedia data crawling module, an encyclopedia data preprocessing module, a related entity searching and ranking module and an entity clustering module. The field encyclopedia establishment system based on the general encyclopedia websites has the following beneficial effects: the field encyclopedia is mostly established manually at present, which takes time and labor, and as all related entities cannot be found out manually, the coverage rate is low; instead, the field encyclopedia is established on the basis of the field related entities found out by the field encyclopedia establishment system, and in this way, the labor of establishing the field encyclopedia can be greatly reduced and the coverage rate can be greatly increased; meanwhile, the field encyclopedia established by the field encyclopedia establishment system is greatly convenient for users to obtain the knowledge in specified fields; complex searching and screening processes are omitted, and the pattern that a user passively searches for information is changed into the pattern that the system initiatively provides information.

Description

technical field [0001] The invention relates to a field encyclopedia construction system based on a general encyclopedia website, which belongs to the technical field of open knowledge extraction. Background technique [0002] Nowadays, many online encyclopedia websites, such as Baidu Encyclopedia and Wikipedia, have emerged in recent years, which greatly facilitate users to obtain information. Users can search for the information they need through the built-in search engine. Generally speaking, when a user queries an entity, he is often interested in entities related to the entity, or the purpose of the search is directly all entities related to an entity, such as hoping to search for all people related to Fudan University . However, in the current encyclopedia websites, this purpose cannot be achieved. For example, if you search for all the characters related to Fudan University, you can only search for the characters appearing in the corresponding webpage of Fudan Unive...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30G06K9/62
CPCG06F16/951G06F40/279G06F18/214
Inventor 覃华峥肖仰华汪卫
Owner FUDAN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products