Invalid template generation method and device as well as invalid web page identification method and device

A technology for generating invalid webpages and templates, applied in the field of information processing, can solve problems such as difficult quantification of features, difficult implementation, and long time consumption, and achieve the effect of improving accuracy

Active Publication Date: 2010-12-08
BEIJING SOGOU TECHNOLOGY DEVELOPMENT CO LTD
View PDF0 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0012] This method can analyze the content of the web page in detail, which guarantees the accuracy to a certain extent. The disadvantage is that it needs to manually mark the corpus, which takes a long time. However, the distribution of invalid pages in the actual situation is uneven, and the characteristics are difficult to quantify and difficult to implement.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Invalid template generation method and device as well as invalid web page identification method and device
  • Invalid template generation method and device as well as invalid web page identification method and device
  • Invalid template generation method and device as well as invalid web page identification method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0040] In order to enable those skilled in the art to better understand the solutions of the embodiments of the present invention, the embodiments of the present invention will be further described in detail below with reference to the accompanying drawings and implementation manners.

[0041] Before introducing the specific embodiments of the present invention, first briefly explain several names used in the embodiments of the present invention:

[0042] Invalid webpages refer to webpages with no search value in search engines, such as user error operation prompts, gateway shutdown notifications, etc.;

[0043] Invalid webpage template refers to the common feature of multiple invalid webpages in the collection of invalid webpages, that is, the same sentence;

[0044] Local webpage database refers to a collection of webpages on the Internet that have been included (webpages without html tags).

[0045] The invalid template generation method and invalid webpage identification method of t...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses invalid template generation method and device, relating to the technical field of information processing. The ineffective template generation method comprises the following steps of: acquiring a seed invalid template set, wherein the seed invalid template set includes one or multiple seed invalid web page templates; generating a candidate invalid template set according to the seed invalid web page templates and web pages inside a local web page database; and screening to obtain a finial invalid template set from the candidate invalid template set. The invention also discloses invalid web page identification method and device. The invention can be used for fast and accurately automatically identifying invalid web pages.

Description

Technical field [0001] The invention relates to information processing technology, in particular to an invalid template generation method and device, and invalid webpage identification method and device. Background technique [0002] There are some pages on the Internet, that is, these pages are the user's wrong operation or the prompt information for the user because the website data is not ready, such as http: / / artgle.cn / sceneshow / l18468 / l10, such pages It has no search value for users of search engines. Usually these pages are stored in the local database, which not only takes up a lot of storage space, but also consumes a lot of system resources due to the existence of these pages when users perform certain operations, such as in the data accumulation stage (Spider crawls web pages). It can be seen that if such webpages can be found quickly and accurately, it is very helpful to improve the efficiency of data accumulation and enhance the search effect of users. [0003] For th...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 张超旭佟子健
Owner BEIJING SOGOU TECHNOLOGY DEVELOPMENT CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products