Data mining method based on extraction of Web numerical value tables

A data mining and table technology, applied in the direction of electrical digital data processing, special data processing applications, instruments, etc., can solve the problems of lack of in-depth processing of table content, lack of table search function, etc.

Active Publication Date: 2009-10-14
TONGFANG KNOWLEDGE NETWORK TECH CO LTD (BEIJING)
View PDF0 Cites 57 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0002] Text, tables, and multimedia files (pictures, videos, etc.) are the main manifestations of Web information. At present, general Web search engines do not provide special table search functions, and lack in-depth processing of table content

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data mining method based on extraction of Web numerical value tables
  • Data mining method based on extraction of Web numerical value tables
  • Data mining method based on extraction of Web numerical value tables

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0045] In order to make the purpose, technical solutions and advantages of the present invention clearer, the implementation of the present invention will be further described in detail below in conjunction with the accompanying drawings:

[0046] This embodiment provides a data mining method based on web value table extraction.

[0047] Such as figure 1 As shown, a data mining method based on Web numerical table extraction is provided, and the method includes the following steps:

[0048] Step 10 forms the basic set of the domain knowledge base according to the Web numerical table sample set;

[0049] Using the method of supervised machine learning, the preprocessed Web numerical table sample set is trained to form the basic set of domain knowledge base.

[0050] Step 20 provides the domain knowledge required for extraction.

[0051] Step 30 locates and extracts the numerical table in the Web page, and obtains the field Web numerical table set;

[0052] Locate and extract...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a data mining method based on extraction of Web numerical value tables. The method is based on domain knowledge base, adopts generation of numerical value knowledge element base as a basic target, and mainly comprises construction of domain knowledge base basic set, positioning of Web numerical value tables, recognition of table structure, integration of table content, semantic representation of extract result, data retrieval, automatic learning of domain knowledge and data mining processing and the like. The invention is based on the method of extraction of Web numerical value tables in specific domain, can carry out extraction to data, information and knowledge included in numerical value tables in Web pages, converts semi-structured data into structured data and provides services of data retrieval, data mining analysis and the like. The data mining method can completely and accurately extract valuable numerical value knowledge in large amount of Web numerical tables dispersed on Web and meets the requirements of data query and data analysis of a user.

Description

technical field [0001] The invention relates to the technical field of Web numerical table extraction and data mining, in particular to a data mining method based on Web numerical table extraction. Background technique [0002] Text, tables, and multimedia files (pictures, videos, etc.) are the main manifestations of Web information. At present, general Web search engines do not provide special table search functions, and lack in-depth processing of table content. Tables in Web documents are intensive carriers of data and knowledge, and occupy a large proportion in learning, research, and information Web pages. Among them, Web Numeric-Tables, such as various numerical lists, statistical reports etc.) contains a wealth of domain numerical knowledge. Extracting and mining numerical knowledge from massive Web numerical table collections is of great significance for table search, data query and data analysis. Web numerical table extraction is to extract semantically consistent ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 赵洪肖洪吴晨薛德军
Owner TONGFANG KNOWLEDGE NETWORK TECH CO LTD (BEIJING)
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products