Extraction method and system of structured data of internet based on sample & faced to regime

An Internet structure and data extraction technology, applied in semi-structured data retrieval, semi-structured data mapping/transformation, network data indexing, etc., can solve the problems of complex extraction methods, cannot effectively improve data extraction efficiency, etc., to improve efficiency Effect

Inactive Publication Date: 2007-04-25
关涛
View PDF2 Cites 17 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] In short, the extraction method described is relatively complicated and cannot effectively improve the efficiency of data extraction. It is only effective for specific information extraction in specific fields or within a small range.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Extraction method and system of structured data of internet based on sample & faced to regime
  • Extraction method and system of structured data of internet based on sample & faced to regime
  • Extraction method and system of structured data of internet based on sample & faced to regime

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0036] Figure 1 shows a display interface of the system of the present invention. The display interface 10 includes: a web address input bar 100, a title bar 200, an information display unit 300, a type input window 400, and a function key unit 500; the function key unit 500 further includes a "collect" key 51, an "analysis" key 52, and "extract". "Key 53, "area" key 54, "house type" key 55, "area" key 56, and "price" key 57.

[0037] Among them, the "collect" key 51, the "analysis" key 52, and the "extract" key 53 are the basic keys of the system of the present invention and are used in all fields; the "collect" key 51 is used to start the sample collection process, that is, Start the process of collecting samples obtained by the user; the "Analyze" key 52 is used to start the process of sample analysis, that is, to extract sample features from the webpage displayed by the information display unit 300; the "Extract" key 53 is used to start data extraction The process of integrati...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

This invention discloses a internet structure data extraction method and its system facing samples, which comprises the following steps: sample collection step to automatically get sample through recording user visit data; sample analysis step to analyze sample through language knowledge database; data extraction step to read multiple web pages through http agreement or drive internet; data integration steps to convert sample characteristic information or matched data into one united form.

Description

Technical field [0001] The invention relates to the field of computer applications, in particular to a field-oriented sample-based Internet structured data automatic extraction method and system. Background technique [0002] Data extraction technology is a technology that uses computers to extract valid data in free and semi-free texts according to certain rules, organize them, and show them to users. Data extraction in a specific field is guided by domain-related knowledge, using artificially labeled and regular sample sets for training, so that the abstraction level and coverage of the rules in the data extraction mechanism reach the most reasonable level, and then the text outside the sample set Perform data extraction. [0003] The Chinese Patent Document (Publication / Announcement No. CN1410918) discloses a search engine based on information extraction technology, which mainly uses machine learning methods to learn from a sample set of HTML pages that contain similar informa...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F17/30914H04L29/08675G06F17/30864H04L67/02H04L67/22H04L29/0809G06F16/84G06F16/951H04L67/535
Inventor 关涛
Owner 关涛
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products