Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

SVM (Support Vector Machine)-based Web partitioning method

A support vector machine and site technology, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve the problems of scheduling failure, no corresponding crawler crawling on Web sites, and inability to ensure that there are crawler nodes, and achieve generalization. Strong ability, good error, good fault tolerance and classification effect

Inactive Publication Date: 2011-11-23
HARBIN INST OF TECH
View PDF0 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Otherwise, the division result is related to the order of sampling, which generally causes too many internal samples in the first division set
[0004] Finally, the IWAP algorithm based on the iterative self-organizing Web partition strategy cannot guarantee that each partition set contains crawler nodes, or contains a sufficient number of crawler nodes, which will cause no corresponding crawler to crawl the Web sites in the non-crawler partition set, and the scheduling fails

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • SVM (Support Vector Machine)-based Web partitioning method
  • SVM (Support Vector Machine)-based Web partitioning method
  • SVM (Support Vector Machine)-based Web partitioning method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0028] This specific embodiment provides a kind of Web division method based on support vector machine, such as figure 1 As shown, the steps are:

[0029] (1) Divide all Web sites into N groups;

[0030] (2) Get K=1, 2, 3...N, for the value of each K, select wherein the 1st~K-1, K+1~N group of Web site samples, carry out training initialization to LibSvm;

[0031] (3) LibSVM training;

[0032] (4) Store the trained SVM model;

[0033] (5) Select the Kth group of Web site samples to carry out the Web division test;

[0034] (6) Save the test results of Web division;

[0035] (7) If K

[0036] (8) If the division result < expected result result_sat, then repeat (1) to (7); otherwise, end the procedure, the Web division is completed, and the division result result is obtained.

[0037] The following is the overall flow algorithm of the Web partition algorithm based on support vector machines

[0038]

[0039]

[0040] The specific...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides an SVM (Support Vector Machine)-based Web partitioning method, which comprises the steps of: partitioning all Websites into N groups; taking K=1, 2, 3, ......,N, for the value of each K, selecting Website samples in the 1st-(K-1)th, (K+1)th-Nth groups, and initializing LibSvm training; training LibSvm; storing a trained SVM model; selecting a Website sample in a Kth group to perform a Web partitioning test; and saving the Web partitioning test result. According to the method provided by the invention, the generalization capability of the SVM is strong, fault tolerance and classification can be better performed when louder noise data are processed. The accuracy rate of coordinates established by a network coordinate system is approximately 80 percent, the problem of nonlinear classification can be solved by the SVM, the number of classification by the SVM is fixed, the extreme condition that no crawler crawls on a website is avoided, and the uncertainty of partitioning the number of a set in the clustering algorithm is overcome by using the classification algorithm.

Description

technical field [0001] The invention relates to a Web partitioning method based on a support vector machine, and belongs to the technical field of Web partitioning methods for vector machines. Background technique [0002] First of all, the chainsaw algorithm, a random web partition method, can make the load of the entire system very balanced, but it does not consider the location of nodes at all, and the network distance overhead is too large. [0003] Secondly, the clustering-based Web partitioning method HONet is suitable for situations where the intra-class distance is small and the inter-class distance is large. Otherwise, the division result is related to the order of sampling, which generally results in too many internal samples in the first division set. [0004] Finally, the IWAP algorithm based on the iterative self-organizing Web partition strategy cannot guarantee that each partition set contains crawler nodes, or contains a sufficient number of crawler nodes, w...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30
Inventor 张伟哲张宏莉何慧邸文晨魏一帆
Owner HARBIN INST OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products