Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Very-large-scale automatic categorizer for Web content

a technology of automatic categorizing and web content, applied in the field of data processing, can solve the problems of poor performance, difficult to find pages for casual browsers, and high cost of finding experts

Inactive Publication Date: 2005-01-27
MICROSOFT TECH LICENSING LLC
View PDF31 Cites 18 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Information, opinion, and news are available about a vast array of topics, but the challenge is to find those pages of the Web which are most relevant to the particular needs or desires of the user at any given moment.
This method has several problems, including the cost of finding experts to perform the classification, and the necessary backlog between the time a site is placed on the Web and the time (if ever) it enters the classification hierarchy, moreover a grader expert in one subject area may misclassify a page of another subject, which can make the page more difficult to find for the casual browser.
Although this is an active area of research, existing systems typically work with only a limited number of subject fields and often display poor performance.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Very-large-scale automatic categorizer for Web content
  • Very-large-scale automatic categorizer for Web content
  • Very-large-scale automatic categorizer for Web content

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0022] In the following description, various aspects of the present invention will be described. However, it will be apparent to those skilled in the art that the present invention may be practiced with only some or all aspects of the present invention. For purposes of explanation, specific numbers, materials and configurations are set forth in order to provide a thorough understanding of the present invention. However, it will also be apparent to one skilled in the art that the present invention may be practiced without the specific details. In other instances, well known features are omitted or simplified in order not to obscure the present invention.

[0023] Parts of the description will be presented in terms of operations performed by a processor based device, using terms such as data, storing, selecting, determining, calculating, and the like, consistent with the manner commonly employed by those skilled in the art to convey the substance of their work to others skilled in the a...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

A method and apparatus for efficiently classifying and categorizing data objects such as electronic text, graphics, and audio based documents within very-large-scale hierarchical classification trees is provided. In accordance with one embodiment of the invention, a first node of a plurality of nodes of a subject hierarchy is selected. Previously classified data objects corresponding to a selected first node of a subject hierarchy as well as any associated sub-nodes of the selected node are aggregated to form a content class of data objects. Similarly, data objects corresponding to sibling nodes of the selected node and any associated sub-nodes of the sibling nodes are then aggregated to form an anti-content class of data objects. Features are then extracted from each of the content class of data objects and the anti-content class of data objects to facilitate characterization of said previously classified data objects.

Description

RELATED APPLICATIONS [0001] This application is a non-provisional application of the earlier filed provisional application No. 60 / 289,418, filed on May 7, 2001, and claims priority to the earlier filed '418s provisional application, whose specification is hereby fully incorporated by reference.BACKGROUND OF THE INVENTION [0002] 1. Field of the Invention [0003] The invention relates to the field of data processing. More specifically, the invention relates to the automatic analysis of the content of electronic data objects and the categorization of the electronic data objects into one or more discrete categories. [0004] 2. Background Information [0005] The Internet consists of billions of discrete pages, which can be accessed from any browser-equipped computer or appliance connected to the World Wide Web (hereinafter “Web”). The availability of so many pages simultaneously represents both a boon and a bane to the user. Information, opinion, and news are available about a vast array of...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30
CPCG06F17/30864G06F17/30873Y10S707/955Y10S707/99943Y10S707/914Y10S707/99937Y10S707/916Y10S707/956Y10S707/915Y10S707/917G06F16/951G06F16/954G06F16/3323
Inventor LULICH, DANIEL P.GUILAK, FARZIN G.
Owner MICROSOFT TECH LICENSING LLC
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products