Check patentability & draft patents in minutes with Patsnap Eureka AI!

Method and system for recognizing concept type web pages

A concept and webpage technology, applied in the field of network information processing, can solve problems such as limited methods, low user awareness, and inability to achieve distinction, and achieve the effects of improving recognition speed, good search results, and improving coverage

Active Publication Date: 2012-05-30
BEIJING SOGOU TECHNOLOGY DEVELOPMENT CO LTD
View PDF1 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] Invention 1 provides an independent, automatic and efficient means of identifying conceptual documents, but it has the following problems in practical application: (1) The method of Invention 1 has a certain error rate
Theoretically speaking, the ability to distinguish between the two cannot be achieved only by adjusting the system itself of Invention 1; and (2) the method of Invention 1 will have a certain degree of omission
That is, although Invention 1 guarantees relatively accurate and high-efficiency recognition, it does not guarantee to cover and recognize all conceptual documents
Especially on the Internet, for example, the description of some conceptual document pages does not appear the concept word itself, but the concept word is displayed separately as the title of the page; there are also some concept document content that is very short, which is not enough to use the method of Invention 1 adequate basis for judging
The method of Invention 1 has some inherent defects in the accuracy and recall of conceptual document recognition, either due to the limitation of the method, or due to the diversity of documents
In addition, even for accurately identified conceptual documents, it can be found in user behavior analysis that users' understanding of the authority of the site will affect the selection of search results
For the results of websites that can provide a large number of conceptual documents, users will be more inclined to trust and choose; while websites with only a small amount of conceptual documents, the user awareness is not high, and the results are more difficult to trust
Therefore, although the method of Invention 1 technically provides a means for quickly and accurately identifying conceptual documents, it still cannot fully meet the needs of users for searching

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and system for recognizing concept type web pages
  • Method and system for recognizing concept type web pages
  • Method and system for recognizing concept type web pages

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0030] The present invention will be described in detail below with reference to the accompanying drawings and in combination with embodiments. Throughout, the same reference numerals denote the same elements.

[0031] figure 1 is a flow chart of the method for identifying conceptual webpages according to the first embodiment of the present invention.

[0032] refer to figure 1 , the method for identifying a conceptual webpage according to an embodiment of the present invention includes the following steps:

[0033] Step S102, acquiring a plurality of conceptual webpages in the webpage database;

[0034] Step S104, comparing the number of URIs of conceptual webpages under the directories of all levels of each site domain name with the first threshold, and determining the directory whose URI number of conceptual webpages under it is greater than the first threshold as the conceptual directory; and

[0035] Step S106, matching the URI of the webpage to be identified with eac...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a method for identifying a conceptual web page and a system thereof. The method comprises the following steps: a plurality of conceptual web pages are acquired from a web page database; the URI amount of the conceptual web pages under all levels of directories of each website domain name is compared with a first threshold, the directory under which the URI amount of the conceptual web pages is greater than the first threshold is determined as a conceptual directory; and the URI of the web page to be identified is matched with each conceptual directory, if matched, the web page to be identified is determined as the conceptual web page. The method can quickly and comprehensively distinguish whether the web page is a conceptual web page and the class thereof. The method increases the identification rate and obviously improves the coverage rate in regard to identifying a conceptual document from mass web page data.

Description

technical field [0001] The present invention relates to the field of network information processing, and more specifically, to a method and system for identifying conceptual webpages. Background technique [0002] With the rapid increase of text and multimedia content used in the Internet and other data networks and systems, the data volume of network information has increased dramatically. Therefore, how to help users obtain required information from massive network information as quickly and accurately as possible has become a hot issue in the field of network information processing. [0003] "Concept" usually refers to a knowledge unit (or a general semantic unit) formed by a unique combination of features. Conceptual documents usually take the explanation of the concept as the subject of the document, and describe the connotation and extension of the same concept. [0004] The prior art proposes a technical solution for analyzing and processing various network informat...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/30
Inventor 刘琳
Owner BEIJING SOGOU TECHNOLOGY DEVELOPMENT CO LTD
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More