Patents
Literature
Patsnap Copilot is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Patsnap Copilot

122 results about "Anchor text" patented technology

The anchor text, link label or link text is the visible, clickable text in an HTML hyperlink. The term "anchor" was used in older versions of the HTML specification for what is currently referred to as the "a element". The HTML specification does not have a specific term for anchor text, but refers to it as "text that the a element wraps around". In XML terms (since HTML is XML), the anchor text is the content of the a element, provided that the content is text.

System and methods for automatic clustering of ranked and categorized search objects

A search results page includes multiple search lists generated by multiple clustering operations applied to an initial match set of documents selected based on a user query. A first result list is constructed by clustering a top-n set of documents by primary domain address and sorting based on extrinsic ranking factors such that the first list includes a ranked and ordered list of primary domain linked anchor text. A second result list is constructed by clustering the top-n set of documents based on a unified ranked occurrence of keywords within the top-n set of documents. The generated second list contains a plurality of cluster class references with each of the cluster class reference including a ranked ordered sub-list of the keywords occurring within the top-n set of documents and respectively associated with the cluster class reference, each of the keywords of the ranked ordered sub-lists including linking references to a corresponding one of the top-n set of documents. A third result list is constructed by clustering the top-n set of documents based on a ranked frequency of occurrence of internally linked anchor texts. The generated third result list includes the top-n set of the internally linked anchor texts and respective ranked and ordered sub-lists of linking references to primary domain Web-pages containing the corresponding one of the internally linked anchor texts.
Owner:YEBOL CORP

Timely and high-efficiency crawling method for internet information

ActiveCN103176985ASimplify complexitySimplify the problem of resource allocationSpecial data processing applicationsTime changesThe Internet
The invention discloses a timely and high-efficiency crawling method for internet information and belongs to the technical field of information. The method comprises the following steps: (1) setting a seed address, crawling and storing webpage information, and ensuring navigation pages; (2) carrying out more than once crawling on each navigation page, and analyzing and labeling the crawling webpage; (4) building a theme judgment model and a navigation page change time series prediction model of each website; (5) predicting next time change time of each website navigation page, ensuring next crawling time, crawling the navigation page and extracting a subpage address and an anchor text which are not crawled; (6) adopting the built theme judgment model to judge the extracted subpage address and the anchor text in the last step, and respectively processing the extracted subpage address and the anchor text according to a judgment result; (7) based on a new related page of the crawled theme, forming or updating a present change time series of each website navigation page, and ensuring next crawling time to carry out webpage crawling. The timely and high-efficiency crawling method for the internet information guarantees novelty and topicality of collected information under a small load.
Owner:COMP NETWORK INFORMATION CENT CHINESE ACADEMY OF SCI

Accessibility web browsing method based on linkage cluster

An accessibility web browsing method based on linkage cluster comprises the following steps: web pages are captured from the Internet to obtain the links in the web pages; the URL text content and anchor text content corresponding to each link are extracted, the text information of the web page corresponding to each link is extracted simultaneously; the key words in the URL text, the anchor text and the corresponding web text are obtained; the key words are used as characteristics to express the links of all the web pages as link vectors composed of the key word information formally and respectively, wherein di is the weight information of the i-th key word in the link vectors; and the clustering algorithm is utilized to cluster the link vectors to ensure that the links with the same subject are of a group and the web pages are displayed again in a grouping manner. The invention has the advantage that the links of each web page are clustered to ensure that the links of the web page are displayed in the more compact grouping manner; the web browsing method is suitable for all kinds of web pages, background manual operations are not required, the blind can adopt the method to realize accessibility web browsing, and the common users can also adopt the method to increase the quality of web browsing.
Owner:ZHEJIANG UNIV

Audio/video intelligent catalog information acquisition method facing to wide area network

The invention relates to an audio / video intelligent catalog information acquisition method facing to a wide area network, belonging to the field of computer application, which is characterized in that a weighting algorithm based on position factors of keyword characteristic items is offered; different weighting factors are endowed to the characteristics of different positions in a file so that the theme similarity of webpage contents can be more accurately calculated; a link with higher theme similarity is optimally selected by comprehensively utilizing three aspects of the factors of the similarity of the webpage contents, the URL (Uniform Resource Locator) catalog level information of an ultra-link and the anchor text information of the ultra-link; the cataloguing information of the searched theme webpage is automatically extracted by adopting an information extraction method based on the body and HTML (Hypertext Markup Language); and the extracted cataloguing information is standardized by adopting an improved semantic similarity calculation method. The invention has the advantages that the description item information can be intelligently and automatically provided to a lister; the labor capacity of workers is reduced; the cataloguing efficiency is enhanced; and the method can adapt to different requirements of professional and non-professional listers and wide area network environment.
Owner:COMMUNICATION UNIVERSITY OF CHINA
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products