Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

List page recognition system and method

一种识别系统、列表页的技术,应用在互联网领域,能够解决列表页失效、很难列表页面收集全、新的列表页漏掉等问题,达到准确度高的效果

Active Publication Date: 2015-12-23
BEIJING QIHOO TECH CO LTD
View PDF7 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] Traditionally, it is difficult to collect all the list pages manually or by some known rules.
Moreover, after the website is revised, the old list pages will become invalid, and the new list pages may be missed
Therefore, in traditional vertical search engines, there are always a lot of content that cannot be searched, resulting in low search accuracy

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • List page recognition system and method
  • List page recognition system and method
  • List page recognition system and method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0046] Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. Although exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited by the embodiments set forth herein. Rather, these embodiments are provided for more thorough understanding of the present disclosure and to fully convey the scope of the present disclosure to those skilled in the art.

[0047] The process flow of the list page identification method in this embodiment is as follows: figure 1 shown, including:

[0048] Step S110, extracting the page frame of the pre-acquired webpage, and calculating the page frame ID. The pre-acquired webpage may be a webpage crawled by the whole network search. The method of extracting the page frame of the web page is as follows: extract the page frame of the web page according ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a list page recognition system, and relates to the technical field of Internet. The system comprises a page frame ID calculation module, a mode accumulation module, and a list page recognition module, wherein the page frame ID calculation module is suitable for extracting page frames, obtained in advance, of a webpage and calculating page frame IDs; the mode accumulation module is suitable for calculating the page frame modes when the number of the page frames with the same ID reaches a threshold value; the list page recognition module is suitable for comparing the page frame modes with page frame modes of list pages in a product knowledge base set up in advance to recognize the list pages. The invention further discloses a list page recognition method. According to the list page recognition system and method, as the page frame modes of the webpage are calculated and compared in the product knowledge base according to the modes to recognize the list pages, the problem that the list pages are difficult to collect completely is solved, and the list page recognition system and method have the advantages that the list pages are comprehensively collected, and searching accuracy is high.

Description

[0001] The patent application of the present invention is a divisional application of the Chinese invention patent application with the filing date of September 29, 2012, the application number of 201210376384.8, and the title of "list page identification system and method". technical field [0002] The invention relates to the technical field of the Internet, in particular to a list page identification system and method. Background technique [0003] In search technology, there are basically two categories. One type takes the entire Internet as the object, grabs all web pages (currently, the crawling depth is limited in a site, and js (javascript) is generally not processed, and only some dynamic pages are processed), and the web pages are processed and analyzed Web search, that is, the entire network search. The other type is vertical search that only crawls and analyzes certain types of pages, such as image search, video search, blog search, forum search, news search, et...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/951
Inventor 卢宏林
Owner BEIJING QIHOO TECH CO LTD
Features
  • Generate Ideas
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More