Unlock instant, AI-driven research and patent intelligence for your innovation.

Page aggregation for web sites

a web site and page aggregation technology, applied in the field of page aggregation for web sites, can solve the problems of wasting his time, too many pages or sites that are not related to electronic commerce, and not being able to efficiently and accurately

Inactive Publication Date: 2009-07-21
VERIZON SERVICES GROUP
View PDF126 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

First, to the extent that it is not possible efficiently and accurately to select pages which are part of sites from which electronic commerce can be carried out, a potential electronic commerce user, seeking a list of electronic commerce pages or sites that may be of interest to him, will also receive too many pages or sites that are unrelated to electronic commerce.
This will both waste his time, and frustrate him.
Moreover, to the extent that pages that are part of electronic commerce sites are missed, the user will not receive as complete a list of potentially-useful electronic commerce Web pages or sites as otherwise.
In determining whether a page is part of an electronic commerce site, however, it is not always possible to rely exclusively on information on that page; it is sometimes useful to make the determination based upon the characteristics of other pages in the site.
Prior efforts to solve this problem have not been completely successful.
If one simply assumes that two pages are parts of different sites if they are on separate servers, that leads to missing many pages in large sites which spread over multiple servers.
Nor is it useful to assume that any two sites that are linked are part of the same site.
Thus, treating any pages linked as part of a single site would lead to vastly overestimating the size of a typical Web site.
Finally, it is not sufficient simply to conclude that all pages that share the same URL (uniform resource locator) server hostname are part of the same site.
However, any such effort would be complicated, slow to execute, and of limited accuracy, given the similarity of content between similar sites that may be linked in some circumstances, and on the other hand the variety of content that may be contained within a single site in other circumstances.
Nor is the need for such a technique limited to the problem of classifying Web pages as being part of electronic commerce sites or not.
The more material the Web contains, the more difficult it becomes for a user to formulate a specific search criterion that returns useful pages or sites ranked in order of potential interest to him, without returning so many pages or sites that he is overwhelmed.
Efforts to circumvent this problem to date have not been completely successful.
Users may conduct multiple searches, starting anew each time, but this is wasteful of their time, and frustrating, and their later efforts may be no more successful than their initial ones.
Users may try to guess how to modify a prior search to yield more useful results, but such efforts too may be unsuccessful, leaving users to spend substantial amounts of time sifting through material that is not of interest to find the minority of useful material.
Another problem is that if a search fails to locate certain useful material, the user may not even be aware that that has happened.
But such methods of site selection may not produce the sites that would be most useful to the user, and also may leave the user feeling that his interests have been subordinated to those of advertisers and others.
These problems in efficiently finding the sites of most use to the user may discourage people from taking full advantage of Web resources, and in particular from using the Web for electronic commerce purposes.
But none of these efforts has been fully successful.
Moreover, they all share a single common deficiency.
Because when they begin users often do not know exactly what they want, or where the material they want is most likely to be located, they may be unable to describe the target of their search with any precision.
Thus, any such algorithm, no matter how sophisticated, can only yield results of limited usefulness.
Another group of shortcomings in current methodology that limits the ability to provide useful lists of electronic commerce sites to potential users is the difficulty in maintaining in a conveniently and quickly usable form information about pages or sites on the Web.
It is recognized that a new full search of the Web in response to each inquiry would take excessive time and computer resources to be feasible for most purposes.
However, the Web is so large that it is not desirable to conduct a full new search of the Web for documents containing the specified word in response to the request.
Current techniques for Web searching and retrieval that do not maintain information about documents in the collection in an accessible data base, other than by means of inverted term lists, pose problems.
In particular, they do not organize and maintain information by the underlying document, rather than by the terms of interest.
This leads to a number of problems in providing useful lists of documents in response to user inquiries, which will now be discussed.
One problem that results from the failure to maintain information organized by the underlying document is the difficulty of maintaining accurate and up to date inverted term lists.
This is a problem because, in order for inverted term lists to be useful, they must be reasonably accurate.
If, however, as in the case of the Web, and electronic commerce in particular, the collection is dynamic, with documents being modified or even deleted frequently, inverted term lists can quickly become inaccurate.
This is a problem because, when a user makes a request, and inverted term lists are used to determine which documents may be responsive, incorrect documents will be returned if there have been changes in underlying documents in the collection which are not reflected in inverted term lists.
This process may be very time consuming.
In the case of document collections as extensive as the Web, or even simply of all electronic commerce sites on the Web, there are a very large number of inverted term lists, and many of the inverted term lists may be very long.
Some prior efforts to avoid this problem have been unsatisfactory.
This process has the advantage of reducing the computer resources that must be devoted to the process of updating lists, but the disadvantage is that significant resources are still consumed, and moreover grouping changes introduces delays in the updating process which reduce the accuracy of the results produced when the inverted term lists are used in responding to search queries.
Other problems also stem from the fact that conventional methods generally do not store information in a manner which is organized by document.
There is a further problem that occurs as a result of the fact that some conventional methods do not store information in a manner organized by document.
It is recognized that searches for useful documents can take a relatively long time to process.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

)

[0060]Referring to FIG. 1, a computer system 1 includes a workstation 2 having local storage 3. The workstation may also be connected to a local area network 4 and may access to the Internet 5. The Internet 5 may include or be coupled to remote storage 6. The workstation 2 may be any one of a variety of commercially available computers capable of providing the functionality described in more detail below. The local storage 3 may include ROM, RAM, a hard disk, a CD, or any other media capable of containing data and / or programs for the workstation 2 or other data. The local area network 4, which is coupled to and exchanges data with the workstation, may also contain data and / or program information for use by the workstation 2. The Internet 5 may be accessed in a conventional manner by the workstation 2. Alternatively, the workstation 2 may access the Internet 5 through the local area network 4, as shown by the dotted line of FIG. 1. The remote storage 6 may also contain data and / or p...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

Disclosed is a method and device and computer storage medium for determining whether two pages linked on the World Wide Web are a part of the same World Wide Web site. The method involves examining and comparing the IP addresses of the Web pages, and can also be extended to finding other pages to which a given Web page is linked on the Web, and to determining whether a Web page of interest is part of a Web site with a desired characteristic, such as being part of an electronic commerce site.

Description

[0001]This application is a continuation of U.S. patent application Ser. No. 09 / 364,782 filed on Jul. 30, 1999, which application is assigned to the assignee of the present application. The contents of the Ser. No. 09 / 364,782 application are expressly incorporated by reference herein in its entirety.TECHNICAL FIELD [0002]This invention relates to techniques for determining the relationship between pages on the World Wide Web, and more particularly to methods of determining if pages belong to the same Web site.BACKGROUND OF THE INVENTION [0003]The Internet, of which the World Wide Web is a part, consists of a series of interlinked computer networks and servers around the world. Users of one server or network which is connected to the Internet may send information to, or access information on, any other network or server connected to the Internet by the use of various computer programs which allow such access, such as Web browsers. The information is sent to or received from a network...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(United States)
IPC IPC(8): G06F15/16
Inventor PONTE, JAY MICHAEL
Owner VERIZON SERVICES GROUP