Patents

Literature

Patsnap Eureka AI that helps you search prior art, draft patents, and assess FTO risks, powered by patent and scientific literature data.

124 results about "Anchor text" patented technology

Filter

Efficacy Topic

Property

Owner

Technical Advancement

Application Domain

Technology Topic

Technology Field Word

Patent Country/Region

Patent Type

Patent Status

Application Year

Inventor

The anchor text, link label or link text is the visible, clickable text in an HTML hyperlink. The term "anchor" was used in older versions of the HTML specification for what is currently referred to as the "a element". The HTML specification does not have a specific term for anchor text, but refers to it as "text that the a element wraps around". In XML terms (since HTML is XML), the anchor text is the content of the a element, provided that the content is text.

Generating hyperlinks and anchor text in HTML and non-HTML documents

InactiveUS20050149851A1Natural language data processingWebsite content managementHyperlinkDocument preparation

Systems and methods for generation of hyperlinks and anchor text from data such as reference text in HTML and in non-HTML documents are disclosed. The method generally includes locating a text reference in a source document, searching using a search engine for a target document relating to the text reference, computing anchor text from the text reference, generating a hyperlink to the target document, and associating the hyperlink with the computed anchor text. The locating and / or computing may be based on a respective statistical model of text formatting and / or lexical cues. The text reference may be parsed into pieces such that the searching, computing, generating, and associating are performed for each piece of text. The source document may be an HTML or non-HTML document. The text reference may be a reference to, for example, a paper, article, company, institution, product, search engine, image, object, and geographical location.

Generating hyperlinks and anchor text in HTML and non-HTML documents

Generating hyperlinks and anchor text in HTML and non-HTML documents

Generating hyperlinks and anchor text in HTML and non-HTML documents

Owner:GOOGLE LLC

Systems and methods for using anchor text as parallel corpora for cross-language information retrieval

InactiveUS7146358B1Quality improvementLess translationData processing applicationsWeb data indexingDocumentation procedureAmbiguity

A system performs cross-language query translations. The system receives a search query that includes terms in a first language and determines possible translations of the terms of the search query into a second language. The system also locates documents for use as parallel corpora to aid in the translation by: (1) locating documents in the first language that contain references that match the terms of the search query and identify documents in the second language; (2) locating documents in the first language that contain references that match the terms of the query and refer to other documents in the first language and identify documents in the second language that contain references to the other documents; or (3) locating documents in the first language that match the terms of the query and identify documents in the second language that contain references to the documents in the first language. The system may use the second language documents as parallel corpora to disambiguate among the possible translations of the terms of the search query and identify one of the possible translations as a likely translation of the search query into the second language.

Systems and methods for using anchor text as parallel corpora for cross-language information retrieval

Systems and methods for using anchor text as parallel corpora for cross-language information retrieval

Systems and methods for using anchor text as parallel corpora for cross-language information retrieval

Owner:GOOGLE LLC

System and methods for automatic clustering of ranked and categorized search objects

InactiveUS20100131563A1Great cognitive valueGreat relevanceDigital data information retrievalDigital data processing detailsFrequency of occurrenceInternal link

A search results page includes multiple search lists generated by multiple clustering operations applied to an initial match set of documents selected based on a user query. A first result list is constructed by clustering a top-n set of documents by primary domain address and sorting based on extrinsic ranking factors such that the first list includes a ranked and ordered list of primary domain linked anchor text. A second result list is constructed by clustering the top-n set of documents based on a unified ranked occurrence of keywords within the top-n set of documents. The generated second list contains a plurality of cluster class references with each of the cluster class reference including a ranked ordered sub-list of the keywords occurring within the top-n set of documents and respectively associated with the cluster class reference, each of the keywords of the ranked ordered sub-lists including linking references to a corresponding one of the top-n set of documents. A third result list is constructed by clustering the top-n set of documents based on a ranked frequency of occurrence of internally linked anchor texts. The generated third result list includes the top-n set of the internally linked anchor texts and respective ranked and ordered sub-lists of linking references to primary domain Web-pages containing the corresponding one of the internally linked anchor texts.

System and methods for automatic clustering of ranked and categorized search objects

System and methods for automatic clustering of ranked and categorized search objects

System and methods for automatic clustering of ranked and categorized search objects

Owner:YEBOL CORP

Personalizing anchor text scores in a search engine

ActiveUS7260573B1Data processing applicationsWeb data indexingPersonalizationRanking

A search engine identifies a list of documents from a set of documents in a database in response to a set of query terms. For each document in the list, the search engine determines an information retrieval score based on its content and the query terms, and also identifies a set of source documents that have links to the document and that also have anchor text satisfying a predefined requirement with respect to the query terms. The search engine calculates a personalized page importance score for each of the identified source documents according to a set of user-specific parameters and accumulates the personalized page importance scores to produce a personalized anchor text score for the document. The personalized anchor text score is then combined with the document's information retrieval score to generate a personalized ranking for the document. The documents are ordered according to their respective personalized rankings.

Personalizing anchor text scores in a search engine

Personalizing anchor text scores in a search engine

Personalizing anchor text scores in a search engine

Owner:GOOGLE LLC

System and methods for inferring intent of website visitors and generating and packaging visitor information for distribution as sales leads or market intelligence

ActiveUS20100131835A1Without ambiguityInterprogram communicationCommerceWeb siteBehavioral analytics

A system for inferring intent of visitors to a Website has a visitor-tracking application executing from a digital medium coupled to a server hosting the Website, the server connected to a repository adapted to store data about visitor behavior, and an inference engine for processing the data to infer the intent of visitors. Visitor behavior relative to links is tracked, and intent of a visitor is inferred from one or both, or a combination of analysis of the behavior and deducing meaning for anchor text of links selected.

System and methods for inferring intent of website visitors and generating and packaging visitor information for distribution as sales leads or market intelligence

System and methods for inferring intent of website visitors and generating and packaging visitor information for distribution as sales leads or market intelligence

System and methods for inferring intent of website visitors and generating and packaging visitor information for distribution as sales leads or market intelligence

Owner:CALLIDUS SOFTWARE

Using web structure for classifying and describing web pages

InactiveUS20030221163A1Web data indexingSpecial data processing applicationsHyperlinkNetwork structure

An enhanced method and system for the classification of a target web page and the description of a set of web pages web pages utilizing virtual documents, in which a virtual document comprises extended anchortext extracted from each of a plurality of web pages that includes at least one hyperlink citing each target web page.

Using web structure for classifying and describing web pages

Using web structure for classifying and describing web pages

Using web structure for classifying and describing web pages

Owner:NEC LAB AMERICA

Search engine and method with improved relevancy, scope, and timeliness

InactiveUS20050004943A1Scope efficiencyTimeliness efficiencyDigital data information retrievalDigital data processing detailsHyperlinkPaper document

A search engine and a method achieve timeliness of documents returned in a search result by a relevancy feedback mechanism driven by the frequency in which a URL is returned in recent searches. The relevancy feedback mechanism includes one or more random processes which determine whether or not a cached or indexed web page associated with a URL in the search result should be refreshed. In addition, the random processes also determine whether or not hyperlinks in the cached or indexed web page should be followed to access related web pages. Accesses of web pages resulting from the operations of the random processes are used to update any document index maintained by the search engine. Relevancy scoring functions implemented in look-up tables are also disclosed. A more accurate relevancy scoring function is achieved using a lexicon based on anchortexts of extracted hyperlinks of web documents.

Search engine and method with improved relevancy, scope, and timeliness

Search engine and method with improved relevancy, scope, and timeliness

Search engine and method with improved relevancy, scope, and timeliness

Owner:AFFINI

Automatic object reference identification and linking in a browseable fact repository

ActiveUS20070198481A1Digital data processing detailsVisual data miningWorld Wide WebData science

Links between facts associated with objects are automatically created and maintained in a fact repository. Names of objects are automatically identified in the facts, and collected into a list of names. The facts are then processed to identifying such names in the facts. Identified names are used as anchor text for search links. A search link includes a-search query for a service engine which search the fact repository for facts associated with objects having the same name.

Automatic object reference identification and linking in a browseable fact repository

Automatic object reference identification and linking in a browseable fact repository

Automatic object reference identification and linking in a browseable fact repository

Owner:GOOGLE LLC

Serving advertisements based on keywords related to a webpage determined using external metadata

InactiveUS20080027798A1Improve accuracyMaximize advertiser revenueAdvertisementsHyperlinkInside information

Methods and apparatus for selecting advertisements to display to a user requesting a primary webpage is provided. Keywords related to the primary webpage are determined using internal information of the primary webpage and / or external information provided in neighboring webpages. The external information may include anchor text metadata of hyperlinks on neighboring webpages that link to the primary webpage or include the number of such hyperlinks having a same particular anchor text. Other internal and / or external information may be used to determine a list of keywords related to the primary webpage. One or more of keywords on the list are selected to represent the primary webpage according to one or more objectives. One or more advertisements are selected to be served to the user using the selected keywords. Machine learning techniques may be used to develop a model that automatedly determines keywords representing a webpage.

Serving advertisements based on keywords related to a webpage determined using external metadata

Serving advertisements based on keywords related to a webpage determined using external metadata

Serving advertisements based on keywords related to a webpage determined using external metadata

Owner:YAHOO INC

Architecture for an indexer

InactiveUS20070271268A1Data processing applicationsWeb data indexingDocument IdentifierData field

Disclosed is a technique for indexing data. For each token in a set of documents, a sort key is generated that includes a document identifier that indicates whether a section of a document associated with the sort key is an anchor text section or a context section, wherein the anchor text section and the context text section have a same document identifier; it is determined whether a data field associated with the token is a fixed width; when the data field is a fixed width, the token is designated as one for which fixed width sort is to be performed; and, when the data field is a variable length, the token is designated as one for which a variable width sort is to be performed. The fixed width sort and the variable width sort are performed. For each document, the sort keys are used to bring together the anchor text section and the context section of that document.

Architecture for an indexer

Architecture for an indexer

Architecture for an indexer

Owner:IBM CORP

Method, system, and program for handling anchor text

ActiveUS20050165781A1Data processing applicationsWeb data indexingPaper documentDocument preparation

Disclosed is a method, system, and program for processing anchor text for information retrieval. A set of anchors that point to a target document is formed. Anchors with same anchor text are grouped together. Information is computed for each group. Context information is generated for the target document based on the computed information.

Method, system, and program for handling anchor text

Method, system, and program for handling anchor text

Method, system, and program for handling anchor text

Owner:IBM CORP

System and method for incorporating anchor text into ranking search results

InactiveUS20060074871A1Boosting document rankingPromote productionWeb data indexingSpecial data processing applicationsRankingPaper document

Search results of a search query on a network are ranked according to a scoring function that incorporates anchor text as a term. The scoring function is adjusted so that a target document of anchor text reflect the use of terms in the anchor text in the target document's ranking. Initially, the properties associated with the anchor text are collected during a crawl of the network. A separate index is generated that includes an inverted list of the documents and the terms in the anchor text. The index is then consulted in response to a query to calculate a document's score. The score is then used to rank the documents and produce the query results.

System and method for incorporating anchor text into ranking search results

System and method for incorporating anchor text into ranking search results

System and method for incorporating anchor text into ranking search results

Owner:MICROSOFT TECH LICENSING LLC

Method and device for excavating semantic keywords from text

InactiveCN104239300ASpecial data processing applicationsData miningData science

The invention discloses a method and a device for excavating semantic keywords from a text. According to the invention, the method comprises the steps of: searching known words in the text to obtain multiple candidate keywords; calculating the probability of the candidate of the multiple candidate keywords based on the reference probability and / or the context of the known words, wherein the reference probability shows the probability of the known words as an anchor text, and the probability of the candidate shows the probability of the candidate keywords as the semantic keywords; determining whether the multiple candidate keywords are the semantic keywords of the text based on the probability of the candidate of the multiple candidate keywords.

Method and device for excavating semantic keywords from text

Method and device for excavating semantic keywords from text

Method and device for excavating semantic keywords from text

Owner:FUJITSU LTD

Webpage keywords extracting method, device and system

ActiveCN102135967AReduce parsingHigh precisionSpecial data processing applicationsThe InternetEngineering

The embodiment of the invention discloses a webpage keywords extracting method, comprising the following steps: crawling an internet webpage; extracting an anchor text of the crawled webpage and extracting a URL (uniform resource locator) of the anchor text and the surrounding text of the anchor text; extracting keywords from the anchor text and the surrounding text of the anchor text according to the pre-formulated rules; and correlating the keywords with the URL of the anchor text, so as to use the keywords as the webpage keywords of the webpage directed by the URL of the anchor text. The embodiment of the invention also discloses a webpage keywords extracting device and a system. By adopting the technical scheme, the calculation amount for extraction of the webpage keywords can be reduced, and the accuracy for the keywords extraction is improved.

Webpage keywords extracting method, device and system

Webpage keywords extracting method, device and system

Webpage keywords extracting method, device and system

Owner:HUAWEI TECH CO LTD

Timely and high-efficiency crawling method for internet information

ActiveCN103176985ASimplify complexitySimplify the problem of resource allocationSpecial data processing applicationsTime changesThe Internet

The invention discloses a timely and high-efficiency crawling method for internet information and belongs to the technical field of information. The method comprises the following steps: (1) setting a seed address, crawling and storing webpage information, and ensuring navigation pages; (2) carrying out more than once crawling on each navigation page, and analyzing and labeling the crawling webpage; (4) building a theme judgment model and a navigation page change time series prediction model of each website; (5) predicting next time change time of each website navigation page, ensuring next crawling time, crawling the navigation page and extracting a subpage address and an anchor text which are not crawled; (6) adopting the built theme judgment model to judge the extracted subpage address and the anchor text in the last step, and respectively processing the extracted subpage address and the anchor text according to a judgment result; (7) based on a new related page of the crawled theme, forming or updating a present change time series of each website navigation page, and ensuring next crawling time to carry out webpage crawling. The timely and high-efficiency crawling method for the internet information guarantees novelty and topicality of collected information under a small load.

Timely and high-efficiency crawling method for internet information

Timely and high-efficiency crawling method for internet information

Timely and high-efficiency crawling method for internet information

Owner:COMP NETWORK INFORMATION CENT CHINESE ACADEMY OF SCI

Abbreviation handling in web search

ActiveUS20090259629A1Promote resultsDigital data information retrievalDigital data processing detailsAnchor textSearch engine results page

A method for handling abbreviations in web queries includes building a dictionary of a plurality of possible word expansions for a plurality of potential abbreviations related to query terms received or anticipated to be received by a search engine; accepting a query including an abbreviation; expanding the abbreviation into one of the plurality of word expansions if a probability that the expansion is correct is above a threshold value, wherein the probability is determined by taking into consideration a context of the abbreviation within the query, wherein the context including at least anchor text; and sending the query with the expanded abbreviation to the search engine to generate a search results page related to the query.

Abbreviation handling in web search

Abbreviation handling in web search

Abbreviation handling in web search

Owner:R2 SOLUTIONS

Anchor Text-Based Focused Web Crawler Search Method and System

ActiveCN102298622ASpecial data processing applicationsThe InternetWeb crawler

The invention discloses a search method for focused web crawler based on an anchor text and a system thereof. The method mainly comprises the following steps of obtaining a URL (uniform resource locator) from a URL priority query and downloading from the Internet to obtain a Web page according to the URL; analyzing the downloaded Web page and extracting the URL and the anchor text thereof; screening the extracted URL and anchor text thereof; and selecting an algorithm combined by TF-IDF (term frequency-inverse document frequency) and LSI (latent semantic indexing) to calculate a topic correlativity of the URL and putting the URL matched with the condition in the priority query. The system comprises a URL priority query, a web crawler downloader, a Web page library, a URL parser, a URL filter and a topic correlativity identifier. With the adoption of the search method of focused web crawler based on the anchor text and the system thereof, the topic correlativity of the crawling result of the focused web crawler and the crawling efficiency are improved.

Anchor Text-Based Focused Web Crawler Search Method and System

Anchor Text-Based Focused Web Crawler Search Method and System

Anchor Text-Based Focused Web Crawler Search Method and System

Owner:INST OF AUTOMATION CHINESE ACAD OF SCI

Global anchor text processing

InactiveUS20080162425A1Reduce the number of timesWeb data indexingDigital data processing detailsDocument preparationDocumentation

Provided are techniques for building a search index. While building the search index and using the search index to respond to one or more search requests, an anchor information store is maintained, wherein each entry of the anchor information store identifies a referring document, a target document, and anchor text associated with a link from the referring document to the target document; a document is received for processing; one or more entries in the anchor information store for which the document to be processed is identified as the target document are located; anchor text is retrieved from each of the identified entries; and the retrieved anchor text is stored in an entry of the search index for the document.

Global anchor text processing

Global anchor text processing

Global anchor text processing

Owner:IBM CORP

Document editing using anchors

ActiveUS20090113293A1Improve error correction efficiencyAvoid delayDigital data information retrievalNatural language data processingDocument preparationDocumentation

A user edits text in a draft document by providing input including left and right “anchor” text and replacement text. In response, a document editing system identifies an instance of the left anchor text followed by the right anchor text in the draft document, and replaces text between these instances with the replacement text specified by the user. For example, the user may type a string containing the left anchor text followed by the replacement text followed by the right anchor text, in response to which the system may perform the replacement just described. As a result, the user may specify both the location of, and a correction for, text in the draft document without using cursor keys or other navigation commands to navigate to the location of the text to be corrected, thereby increasing correction efficiency by avoiding the delay associated with such manual navigation.

Document editing using anchors

Document editing using anchors

Document editing using anchors

Owner:MULTIMODAL TECH INC

Method and device for generating searching result

ActiveCN103186574ABoost Relevance RankingFind quicklySpecial data processing applicationsInteraction timeLexical item

The invention provides a method and a device for generating a searching result. The method comprises the following steps of S1, using an anchor text of a webpage or a click text of a user in advance to obtain a lexical item of each website and the weight of each lexical item, and establishing the website model of each website; S2, acquiring a search term of the user, and obtaining each matched webpage matched with the search term through retrieval; S3, obtaining the domain relevance between the search term and the website corresponding to each matched webpage through correlation calculation by using the search term and the website model established in the step S1; and S4, according to the domain relevance between the search term and the website corresponding to each matched webpage, sequencing each matched webpage to generate the searching result. Compared with the prior art, the method has the advantages that the domain relevance sequencing of the searching result can be improved, the user is facilitated to quickly find the searching result, meanwhile, the efficiencies of the user and a system are improved, the interaction times are reduced, and the burden of a server is mitigated.

Method and device for generating searching result

Method and device for generating searching result

Method and device for generating searching result

Owner:BEIJING BAIDU NETCOM SCI & TECH CO LTD

Statistical machine learning-based internet hidden link detection method

ActiveCN104239485AEfficient detectionWeb data indexingDigital data protectionLearning basedSource code file

The invention relates to a statistical machine learning-based hidden link detection method. The method comprises the following steps: (1) collecting real webpage source code data as a training set for a classification model, and dividing the data into a category containing hidden links and a category containing no hidden links; (2) extracting anchor texts, i.e., character contents of link fields, from Html source code files of all the collected webpages of the two categories respectively, then segmenting the anchor texts into single words; (3) vectoring the two categories of texts which are subjected to word segmentation; (4) performing dimension reduction processing on a vector corresponding to each text; (5) training the two categories of data obtained in the step (4) by using a classifier to obtain a classification model; (6) applying the obtained classification model to an unknown webpage to be detected to obtain a hidden link detection result. Whether a webpage contains the hidden link or not is effectively and automatically detected by using the source code of the webpage, so that theoretical and practical support can be provided for a search engine to crack down network cheating.

Statistical machine learning-based internet hidden link detection method

Statistical machine learning-based internet hidden link detection method

Statistical machine learning-based internet hidden link detection method

Owner:CHINA INTERNET NETWORK INFORMATION CENTER

Accessibility web browsing method based on linkage cluster

ActiveCN101986297AImprove browsing efficiencyQuick lockSpecial data processing applicationsCluster algorithmThe Internet

An accessibility web browsing method based on linkage cluster comprises the following steps: web pages are captured from the Internet to obtain the links in the web pages; the URL text content and anchor text content corresponding to each link are extracted, the text information of the web page corresponding to each link is extracted simultaneously; the key words in the URL text, the anchor text and the corresponding web text are obtained; the key words are used as characteristics to express the links of all the web pages as link vectors composed of the key word information formally and respectively, wherein di is the weight information of the i-th key word in the link vectors; and the clustering algorithm is utilized to cluster the link vectors to ensure that the links with the same subject are of a group and the web pages are displayed again in a grouping manner. The invention has the advantage that the links of each web page are clustered to ensure that the links of the web page are displayed in the more compact grouping manner; the web browsing method is suitable for all kinds of web pages, background manual operations are not required, the blind can adopt the method to realize accessibility web browsing, and the common users can also adopt the method to increase the quality of web browsing.

Accessibility web browsing method based on linkage cluster

Accessibility web browsing method based on linkage cluster

Accessibility web browsing method based on linkage cluster

Owner:ZHEJIANG UNIV

Audio/video intelligent catalog information acquisition method facing to wide area network

InactiveCN101968819AAdapt to different needsAdapt to the environmentSpecial data processing applicationsUniform resource locatorFile area network

The invention relates to an audio / video intelligent catalog information acquisition method facing to a wide area network, belonging to the field of computer application, which is characterized in that a weighting algorithm based on position factors of keyword characteristic items is offered; different weighting factors are endowed to the characteristics of different positions in a file so that the theme similarity of webpage contents can be more accurately calculated; a link with higher theme similarity is optimally selected by comprehensively utilizing three aspects of the factors of the similarity of the webpage contents, the URL (Uniform Resource Locator) catalog level information of an ultra-link and the anchor text information of the ultra-link; the cataloguing information of the searched theme webpage is automatically extracted by adopting an information extraction method based on the body and HTML (Hypertext Markup Language); and the extracted cataloguing information is standardized by adopting an improved semantic similarity calculation method. The invention has the advantages that the description item information can be intelligently and automatically provided to a lister; the labor capacity of workers is reduced; the cataloguing efficiency is enhanced; and the method can adapt to different requirements of professional and non-professional listers and wide area network environment.

Audio/video intelligent catalog information acquisition method facing to wide area network

Audio/video intelligent catalog information acquisition method facing to wide area network

Audio/video intelligent catalog information acquisition method facing to wide area network

Owner:COMMUNICATION UNIVERSITY OF CHINA

Calculating a downloading priority for the uniform resource locator in response to the domain density score, the anchor text score, the URL string score, the category need score, and the link proximity score for targeted web crawling

ActiveUS7672943B2Improve efficiencyAvoid excessive computationDigital data information retrievalDigital data processing detailsCombined useApplication software

A web crawler system as described herein utilizes a targeted approach to increase the likelihood of downloading web pages of a desired type or category. The system employs a plurality of URL scoring metrics that generate individual scores for outlinked URLs contained in a downloaded web page. For each outlinked URL, the individual scores are combined using an appropriate algorithm or formula to generate an overall score that represents a downloading priority for the outlinked URL. The web crawler application can then download subsequent web pages in an order that is influenced by the downloading priorities.

Calculating a downloading priority for the uniform resource locator in response to the domain density score, the anchor text score, the URL string score, the category need score, and the link proximity score for targeted web crawling

Calculating a downloading priority for the uniform resource locator in response to the domain density score, the anchor text score, the URL string score, the category need score, and the link proximity score for targeted web crawling

Calculating a downloading priority for the uniform resource locator in response to the domain density score, the anchor text score, the URL string score, the category need score, and the link proximity score for targeted web crawling

Owner:MICROSOFT TECH LICENSING LLC

System and method for incorporating anchor text into ranking search results

InactiveUS7739277B2Boosting document rankingPromote productionWeb data indexingDigital data processing detailsRankingDocument preparation

Search results of a search query on a network are ranked according to a scoring function that incorporates anchor text as a term. The scoring function is adjusted so that a target document of anchor text reflect the use of terms in the anchor text in the target document's ranking. Initially, the properties associated with the anchor text are collected during a crawl of the network. A separate index is generated that includes an inverted list of the documents and the terms in the anchor text. The index is then consulted in response to a query to calculate a document's score. The score is then used to rank the documents and produce the query results.

System and method for incorporating anchor text into ranking search results

System and method for incorporating anchor text into ranking search results

System and method for incorporating anchor text into ranking search results

Owner:MICROSOFT TECH LICENSING LLC

Automatic short text semantic concept expansion method and system based on open knowledge base

ActiveCN103150382ASimple structureEasy to calculateSpecial data processing applicationsLexical itemAmbiguity

The invention discloses an automatic short text semantic concept expansion method based on an open knowledge base, which comprises the steps that elements in n-gram sets generated by short texts are linked to concepts most relevant to the elements in the open knowledge base, and expandable semantic concept sets are generated for the elements based on a concept relationship matrix and the linked concepts of the open knowledge base. According to the method, only anchor text information in a document of the open knowledge base, rather than lexical item information and directory information of the document, is adopted to construct the concept relationship matrix, so that the construction and calculation of the matrix are convenient, and the problems of low granularity ratio of the directory information and many different meanings are solved. During a semantic concept expansion stage, a context based semantic similarity calculation method is adopted for semantic concept expansion, and context consistency of a short text content and similarity of the concepts in an abstract semantic layer are considered, so that the semantic concept expansion accuracy is improved.

Automatic short text semantic concept expansion method and system based on open knowledge base

Automatic short text semantic concept expansion method and system based on open knowledge base

Automatic short text semantic concept expansion method and system based on open knowledge base

Owner:INST OF COMPUTING TECH CHINESE ACAD OF SCI

Pipelined architecture for global analysis and index building

InactiveUS20070282829A1Data processing applicationsWeb data indexingInformation retrievalAnchor text

Disclosed is a technique for building an index. A new indexi+1 is built and an anchor text tablei+1 and a duplicates tableti+1 are output using a storesi, a delta store, and previously generated global analysis computationsi, wherein the previously generated global analysis computationsi include an anchor text tablei, a rank tablei, and a duplicates tablei. New global analysis computationsi+1 are generated using the anchor text tablei+1, the duplicates tablei+1, and the previously generated global analysis computationsi.

Pipelined architecture for global analysis and index building

Pipelined architecture for global analysis and index building

Pipelined architecture for global analysis and index building

Owner:GOOGLE LLC

User access content-based real-time personalized information collection method

InactiveCN105138558AAccurate collectionSpecial data processing applicationsPersonalizationReal time analysis

The invention discloses a user access content-based real-time personalized information collection method, which comprises the following steps: obtaining a current seed page by analyzing a user network request in real time and extracting structural information of a webpage; extracting a topic keyword from various angles according to the structural information of the webpage; constituting a topic keyword entry; extracting an anchor text of a sub-link of the current seed page, and carrying out word segmentation on the anchor text according to the topic keyword entry, building a vector space model according to the word segmentation result, and calculating the topic relevance between the sub-link and the current seed page by the cosine law according to the vector space model; judging the sub-link of which the top relevance is greater than the set threshold as an effective sub-link; building a link topic classification base, and carrying out seed link priority setup and current seed link topic classification; calculating the importance of all sub-links in the link topic classification base, ranking the sub-links according to the importance, and carrying out downloading and data storage of the ranked corresponding page information.

User access content-based real-time personalized information collection method

User access content-based real-time personalized information collection method

User access content-based real-time personalized information collection method

Owner:SHANDONG UNIV

Automatic object reference identification and linking in a browseable fact repository

ActiveUS8260785B2Digital data processing detailsVisual data miningWeb search queryAnchor text

Links between facts associated with objects are automatically created and maintained in a fact repository. Names of objects are automatically identified in the facts, and collected into a list of names. The facts are then processed to identifying such names in the facts. Identified names are used as anchor text for search links. A search link includes a search query for a service engine which search the fact repository for facts associated with objects having the same name.

Automatic object reference identification and linking in a browseable fact repository

Automatic object reference identification and linking in a browseable fact repository

Automatic object reference identification and linking in a browseable fact repository

Owner:GOOGLE LLC

Method for handling anchor text

ActiveUS7499913B2Data processing applicationsWeb data indexingAnchor textInformation retrieval

Disclosed is a method for processing anchor text for information retrieval. A set of anchors that point to a target document is formed. Anchors with same anchor text are grouped together. Information is computed for each group. Context information is generated for the target document based on the computed information.

Method for handling anchor text

Method for handling anchor text

Method for handling anchor text

Owner:IBM CORP

Popular searches

Statistical model HTML Source document Parallel corpora Human language Text corpus Spoken Language Ability Result list Theoretical computer science Collation