Studying method and system based on increment Q-Learning

A page and network page technology, applied in the learning field based on incremental Q-Learning, can solve the problems of low crawling harvest rate, failure to update, lack of online incremental learning, etc., to achieve improved architecture and strong self-adaptation Effective and fast optimization of crawling strategies

Inactive Publication Date: 2012-11-21
HARBIN INST OF TECH SHENZHEN GRADUATE SCHOOL
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0011] In order to solve the problem existing in the existing theme crawling technology, for relatively "narrow" target topics, the crawling harvest rate of the system is low, and the generated page (hypertext) classification model and hyperlink evaluation model are static. cannot be updated, and lacks the ability to learn incrementally online, resulting in the initial sample pages (including pages in the topic hierarchy directory and sample pages provided by users) becoming the main factor determining the performance of hypertext classifiers and hyperlink evaluators
[0012] In order to solve the problem existing in the existing theme crawling technology, for relatively "narrow" target topics, the crawling harvest rate of the system is low, and the generated page (hypertext) classification model and hyperlink evaluation model are static. cannot be updated, and lacks the ability to learn incrementally online, resulting in the initial sample pages (including pages in the topic hierarchy directory and sample pages provided by users) becoming the main factor determining the performance of hypertext classifiers and hyperlink evaluators

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Studying method and system based on increment Q-Learning
  • Studying method and system based on increment Q-Learning
  • Studying method and system based on increment Q-Learning

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0030] Below in conjunction with accompanying drawing and embodiment the present invention will be further described:

[0031] Reinforcement learning is an important branch of machine learning. From the perspective of intelligent Agent (agent program: in some query systems, users can put forward query requirements in their favorite format, and then the agent program Agent converts them into strictly defined query parameters suitable for database use), it is to study how to use Autonomous Agent perceives the environment and learns the optimal control strategy in the interaction with the environment, so as to achieve the goal state under the guidance of the strategy. The process for the agent to find the target state is a Markov decision process (Markov decision process, MDP), which can be defined by the reward (Reward) equation, that is, the interaction result between the agent and the environment is expressed in the form of reward. The actions taken by the current environment...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a network page creeping method and a creeping system. In the method, the system recalculates the Q value of each knot function on the hyperlink link corresponding to a new creeping web page; the system re-disperses the function Q value according to the newly calculated function Q value to form a new sample; then an NB classifying device is retrained to obtain a new Q value classification model which is utilized for recalculating the Q value of each candidate URL in a URL queue; finally the IQ-Learning arithmetic leads a page correlation evaluating device to carry out increment learning. The innovation point of the system structure of the invention lies in the addition of the Q-Learning on-line sample generator which carries out analysis and evaluation to the pages obtained by on-line creeping and generates new positive-example samples or negative-example samples so as to cause the increment leaning to be possible. The technique introduced by the invention effectively enhances the obtaining rate of theme crawlers.

Description

technical field [0001] The invention relates to a learning method and system based on incremental Q-Learning, which is an incremental Q-Learning learning method and system for quickly and effectively retrieving information required by users from the World Wide Web. Background technique [0002] Web crawler (Web Crawler or Spider, Robot) is an information collection system. It collects Web pages by downloading them and traversing the Web along the hyperlinks in the crawled pages. General web crawlers are usually used in general search engines as a page collection system for search engines. It usually traverses the Web in a breadth-first mode (that is, non-selective), and strives to collect as many Web pages as possible within a limited crawl cycle. [0003] Web crawlers use a specific crawling strategy to periodically collect as many web pages as possible, and then submit them to the automatic indexing system; the indexing system builds an index library based on the corresp...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/30G06N1/00G06N99/00
Inventor 叶允明
Owner HARBIN INST OF TECH SHENZHEN GRADUATE SCHOOL
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products