Sorting technique of deep web database only providing simple query interface

A technology of simple query and classification method, applied in the field of information retrieval, can solve problems such as the problem of deep web database classification not well solved

Inactive Publication Date: 2010-12-22
崔志明 +2
View PDF0 Cites 10 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0007] To sum up, the current existing work has not yet solved the classification problem of deep web databases, especially for the classification of structured deep web databases that only provide simple query interfaces.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Sorting technique of deep web database only providing simple query interface
  • Sorting technique of deep web database only providing simple query interface
  • Sorting technique of deep web database only providing simple query interface

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0073] Example: see attached figure 1 to attach Figure 5 As shown, a method for classifying deep web databases that only provides a simple query interface includes the following steps:

[0074] 1. Obtain domain query samples;

[0075] The field query sample is selected from 600 documents in the field of books (these documents include: book database record documents, example documents, and documents related to the field of books in public catalogues). First, the document set is preprocessed, including: word segmentation, removal of stop Word usage, word frequency statistics, etc. After the text preprocessing is completed, the document frequency (DF) is used to remove the low-frequency words, and the 200 feature words with the largest weight are selected to form the domain query samples. The weight of the feature words adopts the TF-IDF method.

[0076] The construction of domain query samples is an existing technology, such as the literature Zhiguo Gong, JingbaiZhang, and ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a sorting technique of a deep web database only providing a simple query interface. The method comprises the following steps: setting the result model and the result webpage data area content of the deep web database as two sorting characteristics and respectively establishing a sorter based on the result model and a sorter based on the result webpage data area content; sorting based on the result model to obtain probability omega of the simple query interface, which is based on the result model and belongs to the field D; sorting based on the result webpage data area content to obtain probability theta of the simple query interface, which is based on the result webpage data area content and belongs to the field D; and integrating the results of the two sorting techniques and determining the category of the deep web database to be sorted according to weight and sorting threshold value. The method of the invention can realize automatic sorting of the deep web database only providing the simple query interface. Experiments prove that the method of the invention enjoys high degree of accuracy.

Description

technical field [0001] The invention relates to a method for information retrieval, in particular to a classification method for deep web databases that only provide simple query interfaces, and is used to realize automatic classification for deep web databases that only provide simple query interfaces. Background technique [0002] There are a large number of information pages on the Internet. Usually, search engines can search these pages through web crawlers (Crawlers), so that visitors can obtain the information pages they need according to keywords. However, with the wide application of Web databases, the Internet is accelerating its "deepening". A large number of pages are dynamically generated by the background database. The information on these pages cannot be obtained directly through static links, but can only be obtained by filling out forms and submitting queries. . Because traditional web crawlers cannot efficiently search these pages, existing search engines c...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 崔志明鲜学丰赵朋朋
Owner 崔志明
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products