Patents
Literature
Hiro is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Hiro

124 results about "Anchor text" patented technology

The anchor text, link label or link text is the visible, clickable text in an HTML hyperlink. The term "anchor" was used in older versions of the HTML specification for what is currently referred to as the "a element". The HTML specification does not have a specific term for anchor text, but refers to it as "text that the a element wraps around". In XML terms (since HTML is XML), the anchor text is the content of the a element, provided that the content is text.

Generating hyperlinks and anchor text in HTML and non-HTML documents

Systems and methods for generation of hyperlinks and anchor text from data such as reference text in HTML and in non-HTML documents are disclosed. The method generally includes locating a text reference in a source document, searching using a search engine for a target document relating to the text reference, computing anchor text from the text reference, generating a hyperlink to the target document, and associating the hyperlink with the computed anchor text. The locating and / or computing may be based on a respective statistical model of text formatting and / or lexical cues. The text reference may be parsed into pieces such that the searching, computing, generating, and associating are performed for each piece of text. The source document may be an HTML or non-HTML document. The text reference may be a reference to, for example, a paper, article, company, institution, product, search engine, image, object, and geographical location.
Owner:GOOGLE LLC

Systems and methods for using anchor text as parallel corpora for cross-language information retrieval

A system performs cross-language query translations. The system receives a search query that includes terms in a first language and determines possible translations of the terms of the search query into a second language. The system also locates documents for use as parallel corpora to aid in the translation by: (1) locating documents in the first language that contain references that match the terms of the search query and identify documents in the second language; (2) locating documents in the first language that contain references that match the terms of the query and refer to other documents in the first language and identify documents in the second language that contain references to the other documents; or (3) locating documents in the first language that match the terms of the query and identify documents in the second language that contain references to the documents in the first language. The system may use the second language documents as parallel corpora to disambiguate among the possible translations of the terms of the search query and identify one of the possible translations as a likely translation of the search query into the second language.
Owner:GOOGLE LLC

System and methods for automatic clustering of ranked and categorized search objects

A search results page includes multiple search lists generated by multiple clustering operations applied to an initial match set of documents selected based on a user query. A first result list is constructed by clustering a top-n set of documents by primary domain address and sorting based on extrinsic ranking factors such that the first list includes a ranked and ordered list of primary domain linked anchor text. A second result list is constructed by clustering the top-n set of documents based on a unified ranked occurrence of keywords within the top-n set of documents. The generated second list contains a plurality of cluster class references with each of the cluster class reference including a ranked ordered sub-list of the keywords occurring within the top-n set of documents and respectively associated with the cluster class reference, each of the keywords of the ranked ordered sub-lists including linking references to a corresponding one of the top-n set of documents. A third result list is constructed by clustering the top-n set of documents based on a ranked frequency of occurrence of internally linked anchor texts. The generated third result list includes the top-n set of the internally linked anchor texts and respective ranked and ordered sub-lists of linking references to primary domain Web-pages containing the corresponding one of the internally linked anchor texts.
Owner:YEBOL CORP

Personalizing anchor text scores in a search engine

A search engine identifies a list of documents from a set of documents in a database in response to a set of query terms. For each document in the list, the search engine determines an information retrieval score based on its content and the query terms, and also identifies a set of source documents that have links to the document and that also have anchor text satisfying a predefined requirement with respect to the query terms. The search engine calculates a personalized page importance score for each of the identified source documents according to a set of user-specific parameters and accumulates the personalized page importance scores to produce a personalized anchor text score for the document. The personalized anchor text score is then combined with the document's information retrieval score to generate a personalized ranking for the document. The documents are ordered according to their respective personalized rankings.
Owner:GOOGLE LLC

System and methods for inferring intent of website visitors and generating and packaging visitor information for distribution as sales leads or market intelligence

A system for inferring intent of visitors to a Website has a visitor-tracking application executing from a digital medium coupled to a server hosting the Website, the server connected to a repository adapted to store data about visitor behavior, and an inference engine for processing the data to infer the intent of visitors. Visitor behavior relative to links is tracked, and intent of a visitor is inferred from one or both, or a combination of analysis of the behavior and deducing meaning for anchor text of links selected.
Owner:CALLIDUS SOFTWARE

Using web structure for classifying and describing web pages

An enhanced method and system for the classification of a target web page and the description of a set of web pages web pages utilizing virtual documents, in which a virtual document comprises extended anchortext extracted from each of a plurality of web pages that includes at least one hyperlink citing each target web page.
Owner:NEC LAB AMERICA

Search engine and method with improved relevancy, scope, and timeliness

InactiveUS20050004943A1Scope efficiencyTimeliness efficiencyDigital data information retrievalDigital data processing detailsHyperlinkPaper document
A search engine and a method achieve timeliness of documents returned in a search result by a relevancy feedback mechanism driven by the frequency in which a URL is returned in recent searches. The relevancy feedback mechanism includes one or more random processes which determine whether or not a cached or indexed web page associated with a URL in the search result should be refreshed. In addition, the random processes also determine whether or not hyperlinks in the cached or indexed web page should be followed to access related web pages. Accesses of web pages resulting from the operations of the random processes are used to update any document index maintained by the search engine. Relevancy scoring functions implemented in look-up tables are also disclosed. A more accurate relevancy scoring function is achieved using a lexicon based on anchortexts of extracted hyperlinks of web documents.
Owner:AFFINI

Automatic object reference identification and linking in a browseable fact repository

Links between facts associated with objects are automatically created and maintained in a fact repository. Names of objects are automatically identified in the facts, and collected into a list of names. The facts are then processed to identifying such names in the facts. Identified names are used as anchor text for search links. A search link includes a-search query for a service engine which search the fact repository for facts associated with objects having the same name.
Owner:GOOGLE LLC

Serving advertisements based on keywords related to a webpage determined using external metadata

InactiveUS20080027798A1Improve accuracyMaximize advertiser revenueAdvertisementsHyperlinkInside information
Methods and apparatus for selecting advertisements to display to a user requesting a primary webpage is provided. Keywords related to the primary webpage are determined using internal information of the primary webpage and / or external information provided in neighboring webpages. The external information may include anchor text metadata of hyperlinks on neighboring webpages that link to the primary webpage or include the number of such hyperlinks having a same particular anchor text. Other internal and / or external information may be used to determine a list of keywords related to the primary webpage. One or more of keywords on the list are selected to represent the primary webpage according to one or more objectives. One or more advertisements are selected to be served to the user using the selected keywords. Machine learning techniques may be used to develop a model that automatedly determines keywords representing a webpage.
Owner:YAHOO INC

Architecture for an indexer

Disclosed is a technique for indexing data. For each token in a set of documents, a sort key is generated that includes a document identifier that indicates whether a section of a document associated with the sort key is an anchor text section or a context section, wherein the anchor text section and the context text section have a same document identifier; it is determined whether a data field associated with the token is a fixed width; when the data field is a fixed width, the token is designated as one for which fixed width sort is to be performed; and, when the data field is a variable length, the token is designated as one for which a variable width sort is to be performed. The fixed width sort and the variable width sort are performed. For each document, the sort keys are used to bring together the anchor text section and the context section of that document.
Owner:IBM CORP

Method, system, and program for handling anchor text

Disclosed is a method, system, and program for processing anchor text for information retrieval. A set of anchors that point to a target document is formed. Anchors with same anchor text are grouped together. Information is computed for each group. Context information is generated for the target document based on the computed information.
Owner:IBM CORP

System and method for incorporating anchor text into ranking search results

Search results of a search query on a network are ranked according to a scoring function that incorporates anchor text as a term. The scoring function is adjusted so that a target document of anchor text reflect the use of terms in the anchor text in the target document's ranking. Initially, the properties associated with the anchor text are collected during a crawl of the network. A separate index is generated that includes an inverted list of the documents and the terms in the anchor text. The index is then consulted in response to a query to calculate a document's score. The score is then used to rank the documents and produce the query results.
Owner:MICROSOFT TECH LICENSING LLC

Webpage keywords extracting method, device and system

The embodiment of the invention discloses a webpage keywords extracting method, comprising the following steps: crawling an internet webpage; extracting an anchor text of the crawled webpage and extracting a URL (uniform resource locator) of the anchor text and the surrounding text of the anchor text; extracting keywords from the anchor text and the surrounding text of the anchor text according to the pre-formulated rules; and correlating the keywords with the URL of the anchor text, so as to use the keywords as the webpage keywords of the webpage directed by the URL of the anchor text. The embodiment of the invention also discloses a webpage keywords extracting device and a system. By adopting the technical scheme, the calculation amount for extraction of the webpage keywords can be reduced, and the accuracy for the keywords extraction is improved.
Owner:HUAWEI TECH CO LTD

Timely and high-efficiency crawling method for internet information

ActiveCN103176985ASimplify complexitySimplify the problem of resource allocationSpecial data processing applicationsTime changesThe Internet
The invention discloses a timely and high-efficiency crawling method for internet information and belongs to the technical field of information. The method comprises the following steps: (1) setting a seed address, crawling and storing webpage information, and ensuring navigation pages; (2) carrying out more than once crawling on each navigation page, and analyzing and labeling the crawling webpage; (4) building a theme judgment model and a navigation page change time series prediction model of each website; (5) predicting next time change time of each website navigation page, ensuring next crawling time, crawling the navigation page and extracting a subpage address and an anchor text which are not crawled; (6) adopting the built theme judgment model to judge the extracted subpage address and the anchor text in the last step, and respectively processing the extracted subpage address and the anchor text according to a judgment result; (7) based on a new related page of the crawled theme, forming or updating a present change time series of each website navigation page, and ensuring next crawling time to carry out webpage crawling. The timely and high-efficiency crawling method for the internet information guarantees novelty and topicality of collected information under a small load.
Owner:COMP NETWORK INFORMATION CENT CHINESE ACADEMY OF SCI

Abbreviation handling in web search

A method for handling abbreviations in web queries includes building a dictionary of a plurality of possible word expansions for a plurality of potential abbreviations related to query terms received or anticipated to be received by a search engine; accepting a query including an abbreviation; expanding the abbreviation into one of the plurality of word expansions if a probability that the expansion is correct is above a threshold value, wherein the probability is determined by taking into consideration a context of the abbreviation within the query, wherein the context including at least anchor text; and sending the query with the expanded abbreviation to the search engine to generate a search results page related to the query.
Owner:R2 SOLUTIONS

Anchor Text-Based Focused Web Crawler Search Method and System

The invention discloses a search method for focused web crawler based on an anchor text and a system thereof. The method mainly comprises the following steps of obtaining a URL (uniform resource locator) from a URL priority query and downloading from the Internet to obtain a Web page according to the URL; analyzing the downloaded Web page and extracting the URL and the anchor text thereof; screening the extracted URL and anchor text thereof; and selecting an algorithm combined by TF-IDF (term frequency-inverse document frequency) and LSI (latent semantic indexing) to calculate a topic correlativity of the URL and putting the URL matched with the condition in the priority query. The system comprises a URL priority query, a web crawler downloader, a Web page library, a URL parser, a URL filter and a topic correlativity identifier. With the adoption of the search method of focused web crawler based on the anchor text and the system thereof, the topic correlativity of the crawling result of the focused web crawler and the crawling efficiency are improved.
Owner:INST OF AUTOMATION CHINESE ACAD OF SCI

Global anchor text processing

Provided are techniques for building a search index. While building the search index and using the search index to respond to one or more search requests, an anchor information store is maintained, wherein each entry of the anchor information store identifies a referring document, a target document, and anchor text associated with a link from the referring document to the target document; a document is received for processing; one or more entries in the anchor information store for which the document to be processed is identified as the target document are located; anchor text is retrieved from each of the identified entries; and the retrieved anchor text is stored in an entry of the search index for the document.
Owner:IBM CORP

Document editing using anchors

A user edits text in a draft document by providing input including left and right “anchor” text and replacement text. In response, a document editing system identifies an instance of the left anchor text followed by the right anchor text in the draft document, and replaces text between these instances with the replacement text specified by the user. For example, the user may type a string containing the left anchor text followed by the replacement text followed by the right anchor text, in response to which the system may perform the replacement just described. As a result, the user may specify both the location of, and a correction for, text in the draft document without using cursor keys or other navigation commands to navigate to the location of the text to be corrected, thereby increasing correction efficiency by avoiding the delay associated with such manual navigation.
Owner:MULTIMODAL TECH INC

Method and device for generating searching result

The invention provides a method and a device for generating a searching result. The method comprises the following steps of S1, using an anchor text of a webpage or a click text of a user in advance to obtain a lexical item of each website and the weight of each lexical item, and establishing the website model of each website; S2, acquiring a search term of the user, and obtaining each matched webpage matched with the search term through retrieval; S3, obtaining the domain relevance between the search term and the website corresponding to each matched webpage through correlation calculation by using the search term and the website model established in the step S1; and S4, according to the domain relevance between the search term and the website corresponding to each matched webpage, sequencing each matched webpage to generate the searching result. Compared with the prior art, the method has the advantages that the domain relevance sequencing of the searching result can be improved, the user is facilitated to quickly find the searching result, meanwhile, the efficiencies of the user and a system are improved, the interaction times are reduced, and the burden of a server is mitigated.
Owner:BEIJING BAIDU NETCOM SCI & TECH CO LTD

Statistical machine learning-based internet hidden link detection method

The invention relates to a statistical machine learning-based hidden link detection method. The method comprises the following steps: (1) collecting real webpage source code data as a training set for a classification model, and dividing the data into a category containing hidden links and a category containing no hidden links; (2) extracting anchor texts, i.e., character contents of link fields, from Html source code files of all the collected webpages of the two categories respectively, then segmenting the anchor texts into single words; (3) vectoring the two categories of texts which are subjected to word segmentation; (4) performing dimension reduction processing on a vector corresponding to each text; (5) training the two categories of data obtained in the step (4) by using a classifier to obtain a classification model; (6) applying the obtained classification model to an unknown webpage to be detected to obtain a hidden link detection result. Whether a webpage contains the hidden link or not is effectively and automatically detected by using the source code of the webpage, so that theoretical and practical support can be provided for a search engine to crack down network cheating.
Owner:CHINA INTERNET NETWORK INFORMATION CENTER

Accessibility web browsing method based on linkage cluster

An accessibility web browsing method based on linkage cluster comprises the following steps: web pages are captured from the Internet to obtain the links in the web pages; the URL text content and anchor text content corresponding to each link are extracted, the text information of the web page corresponding to each link is extracted simultaneously; the key words in the URL text, the anchor text and the corresponding web text are obtained; the key words are used as characteristics to express the links of all the web pages as link vectors composed of the key word information formally and respectively, wherein di is the weight information of the i-th key word in the link vectors; and the clustering algorithm is utilized to cluster the link vectors to ensure that the links with the same subject are of a group and the web pages are displayed again in a grouping manner. The invention has the advantage that the links of each web page are clustered to ensure that the links of the web page are displayed in the more compact grouping manner; the web browsing method is suitable for all kinds of web pages, background manual operations are not required, the blind can adopt the method to realize accessibility web browsing, and the common users can also adopt the method to increase the quality of web browsing.
Owner:ZHEJIANG UNIV

Audio/video intelligent catalog information acquisition method facing to wide area network

The invention relates to an audio / video intelligent catalog information acquisition method facing to a wide area network, belonging to the field of computer application, which is characterized in that a weighting algorithm based on position factors of keyword characteristic items is offered; different weighting factors are endowed to the characteristics of different positions in a file so that the theme similarity of webpage contents can be more accurately calculated; a link with higher theme similarity is optimally selected by comprehensively utilizing three aspects of the factors of the similarity of the webpage contents, the URL (Uniform Resource Locator) catalog level information of an ultra-link and the anchor text information of the ultra-link; the cataloguing information of the searched theme webpage is automatically extracted by adopting an information extraction method based on the body and HTML (Hypertext Markup Language); and the extracted cataloguing information is standardized by adopting an improved semantic similarity calculation method. The invention has the advantages that the description item information can be intelligently and automatically provided to a lister; the labor capacity of workers is reduced; the cataloguing efficiency is enhanced; and the method can adapt to different requirements of professional and non-professional listers and wide area network environment.
Owner:COMMUNICATION UNIVERSITY OF CHINA

Pipelined architecture for global analysis and index building

Disclosed is a technique for building an index. A new indexi+1 is built and an anchor text tablei+1 and a duplicates tableti+1 are output using a storesi, a delta store, and previously generated global analysis computationsi, wherein the previously generated global analysis computationsi include an anchor text tablei, a rank tablei, and a duplicates tablei. New global analysis computationsi+1 are generated using the anchor text tablei+1, the duplicates tablei+1, and the previously generated global analysis computationsi.
Owner:GOOGLE LLC

User access content-based real-time personalized information collection method

The invention discloses a user access content-based real-time personalized information collection method, which comprises the following steps: obtaining a current seed page by analyzing a user network request in real time and extracting structural information of a webpage; extracting a topic keyword from various angles according to the structural information of the webpage; constituting a topic keyword entry; extracting an anchor text of a sub-link of the current seed page, and carrying out word segmentation on the anchor text according to the topic keyword entry, building a vector space model according to the word segmentation result, and calculating the topic relevance between the sub-link and the current seed page by the cosine law according to the vector space model; judging the sub-link of which the top relevance is greater than the set threshold as an effective sub-link; building a link topic classification base, and carrying out seed link priority setup and current seed link topic classification; calculating the importance of all sub-links in the link topic classification base, ranking the sub-links according to the importance, and carrying out downloading and data storage of the ranked corresponding page information.
Owner:SHANDONG UNIV

Method for handling anchor text

Disclosed is a method for processing anchor text for information retrieval. A set of anchors that point to a target document is formed. Anchors with same anchor text are grouped together. Information is computed for each group. Context information is generated for the target document based on the computed information.
Owner:IBM CORP
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products