Patents
Literature
Patsnap Copilot is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Patsnap Copilot

1239 results about "Web crawler" patented technology

A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web, typically for the purpose of Web indexing (web spidering).

Photo Automatic Linking System and method for accessing, linking, and visualizing "key-face" and/or multiple similar facial images along with associated electronic data via a facial image recognition search engine

ActiveUS20070172155A1Quick searchEnhanced and improved organization, classification, and fast sorts and retrievalDigital data information retrievalCharacter and pattern recognitionHealth professionalsWeb crawler
The present invention provides a system and method for input of images containing faces for accessing, linking, and or visualizing multiple similar facial images and associated electronic data for innovative new on-line commercialization, medical and training uses. The system uses various image capturing devices and communication devices to capture images and enter them into a facial image recognition search engine. Embedded facial image recognition techniques within the image recognition search engine extract facial images and encode the extracted facial images in a computer readable format. The processed facial images are then entered for comparison into at least one database populated with facial images and associated information. Once the newly captured facial images are matched with similar “best-fit match” facial images in the facial image recognition search engine's database, the “best-fit” matching images and each image's associated information are returned to the user. Additionally, the newly captured facial image can be automatically linked to the “best-fit” matching facial images, along with comparisons calculated, and/or visualized. Key new use innovations of the system include but are not limited to: input of user selected facial images for use finding multiple similar celebrity look-a-likes, with automatic linking that return the look-a-like celebrities' similar images, associated electronic information, and convenient opportunities to purchase fashion, jewelry, products and services to better mimic your celebrity look-a-likes; health monitoring and diagnostic use by conveniently organizing and superimposing periodically captured patient images for health professionals to view progress of patients; entirely new classes of semi-transparent superimposed training your face to mimic other similar faces, such as mimic celebrity look-a-like cosmetic applications, and or facial expressions; intuitive automatic linking of similar facial images for enhanced information technology in the context of enhanced and improved organization, classification, and fast retrieval objects and advantages; and an improved method of facial image based indexing and retrieval of information from the web-crawler or spider searched Web, USENET, and other resources to provide new types of intuitive easy to use searching, and/or combined use with current key-word searching for optimized searching.
Owner:VR REHAB INC +2

Systems and methods for inferring uniform resource locator (URL) normalization rules

Different URLs that actually reference the same web page or other web resource are detected and that information is used to only download one instance of a web page or web resource from a web site. All web pages or web resources downloaded from a web server are compared to identify which are substantially identical. Once identical web pages or web resources with different URLs are found, the different URLs are then analyzed to identify what portions of the URL are essential for identifying a particular web page or web resource, and what portions are irrelevant. Once this has been done for each set of substantially identical web pages or web resources (also referred to as an “equivalence class” herein), these per-equivalence-class rules are generalized to trans-equivalence-class rules. There are two rule-learning steps: step (1), where it is learned for each equivalence class what portions of the URLs in that class are relevant for selecting the page and what portions are not; and step (2), where the per-equivalence-class rules constructed during step (1) are generalized to rules that cover many equivalence classes. Once a rule is determined, it is applied to the class of web pages or web resources to identify errors. If there are no errors, the rule is activated and is then used by the web crawler for future crawling to avoid the download of duplicative web pages or web resources.
Owner:MICROSOFT TECH LICENSING LLC

Similar web page duplicate-removing system based on parallel programming mode

The invention provides a similar web page duplicate-removing system based on a parallel programming mode, comprises a web page content pre-processing module, a web page eigenvector extracting module,a web page feature fingerprint calculation module, a web page fingerprint on-line duplicate-removing module, a web page fingerprint distributed batch duplicate-removing module and a computing platformbased on specific distribution. The system can complete links of carrying out unified conversion of text content encoding, standardization of document structure, web page noise content abortion, thematic content analysis and identification of web pages, lexical segmentation of continuous text content, and the like on the web pages obtained by crawling of web crawlers, thereby forming eigenvectorswhich can present the web pages. Relative algorithms can be used to obtain web page fingerprints which present web page characteristics aiming at the vector. The system provided by the invention accurately and fast detects fully complete repetition or approximate repetition of the web page contents caused by site mirroring, web document transshipment, and the like on the condition of massive amount of data of Internet and completes corresponding repetition-removing works, thereby enhancing the storage efficiency of search engines and bringing better use experience for the search engines.
Owner:HUAZHONG UNIV OF SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products