Information extraction method

A technology of web page information and result information, applied in the field of information search, can solve the problems of low efficiency of crawler work and waste of storage space, etc.

Active Publication Date: 2010-07-28
蔡亮华
View PDF0 Cites 20 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The main purpose of the present invention is to provide a method for grabbing information, which is used to solve the problem of omission of information valuable to the search topic i

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Information extraction method
  • Information extraction method
  • Information extraction method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0014] The technical solutions of the present invention will be described in further detail below with reference to the accompanying drawings and embodiments.

[0015] figure 1 It is a flow chart of the first embodiment of the information grabbing method of the present invention, such as figure 1 shown, including:

[0016] Step 100, the crawler program obtains webpage information related to the search topic, and performs word segmentation processing on the webpage information, and obtains word segmentation processing result information containing several words;

[0017] Wherein, several words in the word segmentation processing result information are general references, and may also be a phrase composed of words. For example, the result of word segmentation processing, such as: "Beijing hosts the Olympic Games", after word segmentation processing, "Beijing", "hosting" and "Olympic Games" are obtained. Word segmentation methods include string matching method, understanding-b...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to an information extraction method, comprising the following steps: a spider software obtains the website information related to the theme of search and carries out word segmentation on the website information to obtain the word segmentation result information including a plurality of words and/or phrases; weighting operation is carried out on the words and/or phrases respectively based on the same semantic attribute parameter in the semantic corpus to obtain the semantic attribute parameter of the website information; and the website information is stored in the extraction result queue if the semantic attribute parameter of the website information is in the preset range of the semantic attribute parameter. The embodiment of the invention ensures high degree of correlation between the result obtained by extraction and the theme of search, improves the working efficiency of the spider software and minishes the storage space occupied by the spider software.

Description

technical field [0001] The invention relates to information search technology, in particular to an information grabbing method. Background technique [0002] With the popularity of the Internet, people increasingly use information search engines in their daily work and life to obtain various information they need from the Internet. Therefore, information search technology occupies an important position in the Internet industry. In recent years, people's requirements for search results have become higher and higher. [0003] At present, each search engine mainly uses a web crawler to obtain web page information related to a user's search topic. A web crawler is a program that automatically extracts web pages. It downloads and obtains web pages from the Internet according to the search topics provided by users. The web page information includes web news, forums, blogs, and other web pages. The web crawler can start from the addresses of one or several initial web pages, and...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
Inventor 蔡亮华庞然胡新宇
Owner 蔡亮华
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products