Searching engine based on information extraction technique

An information extraction and search engine technology, applied in the field of existing search engines, can solve problems such as the inaccuracy of returned information, and achieve the effect of improving work efficiency

Inactive Publication Date: 2003-04-16
ZHEJIANG UNIV
View PDF0 Cites 29 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the information given by search engines still needs to be manually selected, and because of the inaccuracy of the returned information, this selection work is also very heavy.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Searching engine based on information extraction technique
  • Searching engine based on information extraction technique
  • Searching engine based on information extraction technique

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0014] The present invention will be described in detail below in conjunction with a specific scientific research field (Call Paper page information acquisition). This search engine system based on information extraction technology:

[0015] Step 1: Machine Learning

[0016] Machine learning process: According to different information extraction purposes and different fields, prepare corresponding learning and training samples, and manually mark the samples. The prepared samples are handed over to the learning machine for learning, and the rule set of the learning machine is adjusted to meet certain requirements. 1. Training sample 1. Page acquisition

[0017] The training samples are also some web pages like the information source, so we obtain the training samples from the Internet, that is, find regular Call Paper pages on the Internet, and use them.

[0018] A) Use the existing search engine to search for pages with the Call Paper field; as long as the page contains the...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The machine learning method is carried out for the set of HTML page samples containing the homogeneous information in similar layout so as to obtain the rules for abstracting the information and the structural information from the quasi-free HTML text. The number of the rules and the degree of abstraction are adjusted through the training and learning in order to meet the precision requirement. Then, based on the rule set abstracted after learning, the abstracting information of the text files outside the sample set is carried out. The abstracting information is carried out for the pages withspecific content collected by using the searching engine base on the rules. The invention raises the efficiency of processing information, since combination between the technique of abstracting information.

Description

technical field [0001] The present invention relates to information extraction (Information Extraction) and search engine (SearchEngine) technology, mainly is a kind of technical realization of applying the information extraction technology oriented to a specific field to the existing search engine. Background technique [0002] Information extraction technology is a technology that uses computers to extract effective information in free and semi-free texts according to certain rules, organizes them, and presents them to users. Information extraction in a specific field is guided by domain-related knowledge, using artificially marked and ruled sample sets for training, so that the abstraction level and coverage of the rules in the information extraction mechanism reach the most reasonable level, and then the text outside the sample set Perform information extraction. This technology has always been the core issue in the field of computer artificial int...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/40
Inventor 吴朝晖徐杰锋陆伟
Owner ZHEJIANG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products