Searching system based on encyclopedic data extracting integration

A technology of data extraction and query system, which is applied in electrical digital data processing, special data processing applications, instruments, etc., can solve the problems of not being able to find structured information, cannot reflect the characteristics of structured data information, etc., and achieve fast query speed. , high data quality, simple operation effect

Inactive Publication Date: 2009-12-30
PEKING UNIV
View PDF0 Cites 30 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] The traditional keyword-based information retrieval technology has several defects: on the one hand, the content organization forms in web pages are becoming more and more diverse, and keyword-based search cannot reflect the information characteristics of structured data; on the other hand, More and more data exists in web pages in a structured form, especially in descriptive web files, such as encyclopedia web pages, etc.
Therefore, the traditional keyword-based information retrieval technology can no longer meet the requirements of finding structured information.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Searching system based on encyclopedic data extracting integration
  • Searching system based on encyclopedic data extracting integration
  • Searching system based on encyclopedic data extracting integration

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0014] The present invention will be described in detail below in conjunction with the accompanying drawings and embodiments.

[0015] The system of the present invention includes: a data extraction module 1 , a data integration module 2 and a data query module 3 . Such as figure 1 As shown, the data extraction module 1 includes a document extraction and filtering module 11, a metadata category identification module 12, a table data location module 13, a location and extraction module 14, an identification type module 15, a form analysis module 16, a feature function module 17 and a relationship Type identification module 18 . Wherein, the document extracting and filtering module 11 is used for extracting the encyclopedia webpage required by the user from the encyclopedia database on the Internet, and then filters the encyclopedia webpage document, promptly removes the webpage whose theme is not related to the subject of the user's query, such as advertisement, Web pages wit...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a searching system based on encyclopedic data extracting integration. The searching system is characterized by comprising a data extracting module, a data integrating module and a data searching module, wherein the data extracting module is used for extracting encyclopedic webpage from internet, positioning and initially filtering tables in various encyclopedic webpage, positioning and extracting the tables based on visual features, uniformly converting the extracted tables into a list mode, classifying the tables with the same feature parameters into a sort, extracting and identifying the classification information for the tables of each sort, and storing the classified information into an information database and an XML database; the data integrating module is used for classifying and marking according to sorts, adopting an integrating method to merge the tables with the same attribute into the same mode library, clustering the mode information in each mode library, and outputting mode cluster and recommended mode; and the data searching module is used for searching corresponding table information from the information database, and outputting searching results and the recommended mode.

Description

technical field [0001] The invention relates to a data retrieval system, in particular to a query system based on encyclopedia data extraction and integration. Background technique [0002] With the rapid development of network information technology, the amount of data on the Internet has grown explosively. Users increasingly hope that when querying information, the query system can directly present the information in a structured form, such as: Inquiry of various performance parameters of a certain product of the same category, inquiry of weather information, etc. Therefore, how to quickly and effectively retrieve the required network data has received extensive attention. The current query technology is information retrieval technology based on keywords. [0003] The traditional keyword-based information retrieval technology has several defects: on the one hand, the content organization forms in web pages are becoming more and more diverse, and keyword-based search canno...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 伍伟高军王腾蛟杨冬青
Owner PEKING UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products