Similarity-based semi-supervised learning spam page detection method
A semi-supervised learning, spam web technology, applied in special data processing applications, instruments, electrical digital data processing, etc., to achieve the effect of simplifying the calculation steps
Inactive Publication Date: 2010-08-25
NANJING UNIV OF POSTS & TELECOMM
View PDF0 Cites 10 Cited by
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Technical problem: the purpose of the present invention is to design a similarity-based semi-supervised learning spam webpage detection method to solve the problems that occur in semi-supervised learning using webpage link relationships
Method used
the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View moreImage
Smart Image Click on the blue labels to locate them in the text.
Smart ImageViewing Examples
Examples
Experimental program
Comparison scheme
Effect test
Embodiment Construction
the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More PUM
Login to View More
Abstract
The invention relates to a similarity-based semi-supervised learning spam page detection method, which solves the problems in semi-supervised learning through page links. A hidden 'link' diagram is established based on page similarity in the method. The method comprises the following steps: 1, extracting page features based on contents and links; 2, carrying out feature extraction for features extracted in Step 1 in a method of principal component analysis; 3, establishing a hidden 'link' diagram according to the page similarity; 4, building a Gaussian random field model on the 'link' diagram, and carrying out semi-supervised learning through harmonic functions; and 5, combining classification results of the model established in Step 4 and other classifiers, thereby improving the classification effect. In the diagram, the weight is given to the links between pages according to the similarity, the Gaussian random field model is then established, and the harmonic functions are adopted for semi-supervised learning, thereby improving the semi-supervised learning capacity.
Description
A Similarity-Based Semi-supervised Learning Spam Detection Method technical field The invention relates to a method for detecting garbage webpages of search engines, which mainly solves the problem of detecting garbage webpages under the condition of small samples, and belongs to the fields of search engines and semi-supervised machine learning. Background technique Search engines enable users to find the correct content they are interested in from a large number of web pages. But the prevalence of spam has damaged the credibility of search engines and eroded the trust of their users. Finding an effective way to reduce the impact of webpage spam and improve the quality of search engine webpage ranking is very important for users to quickly find interesting and correct webpages. Initially, search engines used traditional information extraction algorithms, such as TF-IDF (Term Frequency-Inverse Document Frequency) [1], to rank the results returned by queries submitted to t...
Claims
the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More Application Information
Patent Timeline
Login to View More
IPC IPC(8): G06F17/30
Inventor 张卫丰朱丹梅周国强张迎周陆柳敏许碧娣刘霞
Owner NANJING UNIV OF POSTS & TELECOMM
Who we serve
- R&D Engineer
- R&D Manager
- IP Professional
Why Patsnap Eureka
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com