Patents

Literature

Patsnap Eureka AI that helps you search prior art, draft patents, and assess FTO risks, powered by patent and scientific literature data.

2751 results about "Text corpus" patented technology

Filter

Efficacy Topic

Property

Owner

Technical Advancement

Application Domain

Technology Topic

Technology Field Word

Patent Country/Region

Patent Type

Patent Status

Application Year

Inventor

In linguistics, a corpus (plural corpora) or text corpus is a large and structured set of texts (nowadays usually electronically stored and processed). In corpus linguistics, they are used to do statistical analysis and hypothesis testing, checking occurrences or validating linguistic rules within a specific language territory.

Method and system for optimally searching a document database using a representative semantic space

InactiveUS6847966B1Reduced dimensionData processing applicationsDigital data information retrievalSingular value decompositionSubject matter

A term-by-document matrix is compiled from a corpus of documents representative of a particular subject matter that represents the frequency of occurrence of each term per document. A weighted term dictionary is created using a global weighting algorithm and then applied to the term-by-document matrix forming a weighted term-by-document matrix. A term vector matrix and a singular value concept matrix are computed by singular value decomposition of the weighted term-document index. The k largest singular concept values are kept and all others are set to zero thereby reducing to the concept dimensions in the term vector matrix and a singular value concept matrix. The reduced term vector matrix, reduced singular value concept matrix and weighted term-document dictionary can be used to project pseudo-document vectors representing documents not appearing in the original document corpus in a representative semantic space. The similarities of those documents can be ascertained from the position of their respective pseudo-document vectors in the representative semantic space.

Method and system for optimally searching a document database using a representative semantic space

Method and system for optimally searching a document database using a representative semantic space

Method and system for optimally searching a document database using a representative semantic space

Owner:KLDISCOVERY ONTRACK LLC

Ranking search results by reranking the results based on local inter-connectivity

InactiveUS6526440B1Data processing applicationsWeb data indexingInterconnectivityDocument preparation

A search engine for searching a corpus improves the relevancy of the results by refining a standard relevancy score based on the interconnectivity of the initially returned set of documents. The search engine obtains an initial set of relevant documents by matching a user's search terms to an index of a corpus. A re-ranking component in the search engine then refines the initially returned document rankings so that documents that are frequently cited in the initial set of relevant documents are preferred over documents that are less frequently cited within the initial set.

Ranking search results by reranking the results based on local inter-connectivity

Ranking search results by reranking the results based on local inter-connectivity

Ranking search results by reranking the results based on local inter-connectivity

Owner:GOOGLE LLC

Process and system for retrieval of documents using context-relevant semantic profiles

InactiveUS6189002B1Minimize timeData processing applicationsDigital computer detailsThe InternetDocument preparation

A process and system for database storage and retrieval are described along with methods for obtaining semantic profiles from a training text corpus, i.e., text of known relevance, a method for using the training to guide context-relevant document retrieval, and a method for limiting the range of documents that need to be searched after a query. A neural network is used to extract semantic profiles from text corpus. A new set of documents, such as world wide web pages obtained from the Internet, is then submitted for processing to the same neural network, which computes a semantic profile representation for these pages using the semantic relations learned from profiling the training documents. These semantic profiles are then organized into clusters in order to minimize the time required to answer a query. When a user queries the database, i.e., the set of documents, his or her query is similarly transformed into a semantic profile and compared with the semantic profiles of each cluster of documents. The query profile is then compared with each of the documents in that cluster. Documents with the closest weighted match to the query are returned as search results.

Process and system for retrieval of documents using context-relevant semantic profiles

Process and system for retrieval of documents using context-relevant semantic profiles

Process and system for retrieval of documents using context-relevant semantic profiles

Owner:DTI OF WASHINGTON

Process for the document management and computer-assisted translation of documents utilizing document corpora constructed by intelligent agents

InactiveUS20030154071A1Efficient processingSignificant costNatural language translationDigital computer detailsMetalanguageExternal storage

A method of document management utilizing document corpora including gathering a source corpus of documents in electronic form, modeling the source corpus in terms of document and domain structure information to identify corpus enhancement parameters, using a metalanguage to electronically tag the source corpus, programming the corpus enhancement parameters into an intelligent agent, and using the intelligent agent to search external repositories to find similar terms and structures, and return them to the source corpora, whereby the source corpus is enhanced to form a unicorpus.

Process for the document management and computer-assisted translation of documents utilizing document corpora constructed by intelligent agents

Process for the document management and computer-assisted translation of documents utilizing document corpora constructed by intelligent agents

Process for the document management and computer-assisted translation of documents utilizing document corpora constructed by intelligent agents

Owner:KENT STATE UNIV

Context vector generation and retrieval

InactiveUS7251637B1Reduce search timeRapid positioningDigital computer detailsBiological neural network modelsCo-occurrenceDocument preparation

A system and method for generating context vectors for use in storage and retrieval of documents and other information items. Context vectors represent conceptual relationships among information items by quantitative means. A neural network operates on a training corpus of records to develop relationship-based context vectors based on word proximity and co-importance using a technique of “windowed co-occurrence”. Relationships among context vectors are deterministic, so that a context vector set has one logical solution, although it may have a plurality of physical solutions. No human knowledge, thesaurus, synonym list, knowledge base, or conceptual hierarchy, is required. Summary vectors of records may be clustered to reduce searching time, by forming a tree of clustered nodes. Once the context vectors are determined, records may be retrieved using a query interface that allows a user to specify content terms, Boolean terms, and / or document feedback. The present invention further facilitates visualization of textual information by translating context vectors into visual and graphical representations. Thus, a user can explore visual representations of meaning, and can apply human visual pattern recognition skills to document searches.

Context vector generation and retrieval

Context vector generation and retrieval

Context vector generation and retrieval

Owner:FAIR ISAAC & CO INC

Methods and Systems of Automatic Ontology Population

InactiveUS20090012842A1Voting apparatusDrawing from basic elementsInternet searchingMedical record

Methods and systems for creating a knowledge graph that relates terms in a corpus of literature in the form of an assertion and provides a probability of the veracity of the assertion are disclosed herein. Various aspects of the invention are directed to and / or involve knowledge graphs and structured digital abstracts (SDAs) offering a machine readable representation of statements in a corpus of literature. Various methods and systems of the invention can automatically extract, structure, and visualize the statements. Such graphs and abstracts can be useful for a variety of applications including, but not necessarily limited to, semantic-based search tools for search of electronic medical records, specific content verticals (e.g. newswire, finance, history) and general internet searches.

Methods and Systems of Automatic Ontology Population

Methods and Systems of Automatic Ontology Population

Methods and Systems of Automatic Ontology Population

Owner:COUNSYL INC

Rule-based learning of word pronunciations from training corpora

InactiveUS6411932B1Improve pronunciationSpeech recognitionSpeech synthesisAlgorithmHuman language

A text-to-pronunciation system (11) includes a large training set of word pronunciations (19) and an extractor for extracting language specific information from the training set to produce pronunciations for words not in its training set. A learner (13) forms pronunciation guesses for words in the training set and for finding a transformation rule that improves the guesses. A rule applier (15) applies the transformation rule found to guesses. The learner (13) repeats the finding of another rule and the rule applier (15) applies the new rule to find the rules that improves the guesses the most.

Rule-based learning of word pronunciations from training corpora

Rule-based learning of word pronunciations from training corpora

Rule-based learning of word pronunciations from training corpora

Owner:TEXAS INSTR INC

Search systems and methods with integration of user annotations

InactiveUS20050234891A1Easy to operateData processing applicationsWeb data indexingPaper documentDocument preparation

Computer systems and methods allow users to annotate content items found in a corpus such as the World Wide Web. Annotations, which can include any descriptive and / or evaluative metadata related to a document, are collected from a user and stored in association with that user. Users are able to annotate and view their annotations for any document they encounter while interacting with the corpus, including hits returned in a search of the corpus. Users are also able to search their annotations or to limit searches to documents they have annotated. Metadata from annotations can also be aggregated across users and aggregated metadata applied in generating search results.

Search systems and methods with integration of user annotations

Search systems and methods with integration of user annotations

Search systems and methods with integration of user annotations

Owner:R2 SOLUTIONS

Method and apparatus for training transliteration model and parsing statistic model, method and apparatus for transliteration

InactiveUS7853444B2Natural language translationSpeech analysisTransliterationParsing

The present invention provides a method and apparatus for training a parsing statistic model, a method and apparatus for transliteration. Said parsing statistic model is to be used in transliteration between a single-syllable language and a multi-syllable language and includes sub-syllable parsing probabilities of said multi-syllable language. Said method for training the parsing statistic model comprising: inputting a bilingual proper name list as corpus, said bilingual proper name list includes a plurality of proper names of said multi-syllable language and corresponding proper names of said single-syllable language respectively; parsing each of said plurality of proper names of multi-syllable language in said bilingual proper name list into a sub-syllable sequence using parsing rules; determining whether said parsing is correct according to the corresponding proper name of said single-syllable language in said bilingual proper name list; and training said parsing statistic model base on the result of parsing that is determined as correct.

Method and apparatus for training transliteration model and parsing statistic model, method and apparatus for transliteration

Method and apparatus for training transliteration model and parsing statistic model, method and apparatus for transliteration

Method and apparatus for training transliteration model and parsing statistic model, method and apparatus for transliteration

Owner:KK TOSHIBA

Search processing with automatic categorization of queries

ActiveUS7620628B2Data processing applicationsDigital data information retrievalData miningConcept network

Search results are processed using search requests, including analyzing received queries in order to provide a more sophisticated understanding of the information being sought. A concept network is generated from a set of queries by parsing the queries into units and defining various relationships between the units. From these concept networks, queries can be automatically categorized into categories, or more generally, can be associated with one or more nodes of a taxonomy. The categorization can be used to alter the search results or the presentation of the results to the user. As an example of alterations of search results or presentation, the presentation might include a list of “suggestions” for related search query terms. As other examples, the corpus searched might vary depending on the category or the ordering or selection of the results to present to the user might vary depending on the category. Categorization might be done using a learned set of query-node pairs where a pair maps a particular query to a particular node in the taxonomy. The learned set might be initialized from a manual indication of which queries go with which nodes and enhanced has more searches are performed. One method of enhancement involves tracking post-query click activity to identify how a category estimate of a query might have varied from an actual category for the query as evidenced by the category of the post-query click activity, e.g., a particular hits of the search results that the user selected following the query. Another method involved determining relationships between units in the form of clusters and using clustering to modify the query-node pairs.

Search processing with automatic categorization of queries

Search processing with automatic categorization of queries

Search processing with automatic categorization of queries

Owner:R2 SOLUTIONS

Method and apparatus for automatic entity disambiguation

ActiveUS7672833B2Efficiently findImprove throughputNatural language data processingOffice automationSemi-structured dataWeight of evidence

Entity disambiguation resolves which names, words, or phrases in text correspond to distinct persons, organizations, locations, or other entities in the context of an entire corpus. The invention is based largely on language-independent algorithms. Thus, it is applicable not only to unstructured text from arbitrary human languages, but also to semi-structured data, such as citation databases and the disambiguation of named entities mentioned in wire transfer transaction records for the purpose of detecting money-laundering activity. The system uses multiple types of context as evidence for determining whether two mentions correspond to the same entity and it automatically learns the weight of evidence of each context item via corpus statistics. The invention uses multiple search keys to efficiently find pairs of mentions that correspond to the same entity, while skipping billions of unnecessary comparisons, yielding a system with very high throughput that can be applied to truly massive data.

Method and apparatus for automatic entity disambiguation

Method and apparatus for automatic entity disambiguation

Method and apparatus for automatic entity disambiguation

Owner:FAIR ISAAC & CO INC

Data processing system for autonomously building speech identification and tagging data

ActiveUS20090306979A1Natural language data processingSpeech recognitionSpoken languageSpeech identification

A method, system, and computer program product for autonomously transcribing and building tagging data of a conversation. A corpus processing agent monitors a conversation and utilizes a speech recognition agent to identify the spoken languages, speakers, and emotional patterns of speakers of the conversation. While monitoring the conversation, the corpus processing agent determines emotional patterns by monitoring voice modulation of the speakers and evaluating the context of the conversation. When the conversation is complete, the corpus processing agent determines synonyms and paraphrases of spoken words and phrases of the conversation taking into consideration any localized dialect of the speakers. Additionally, metadata of the conversation is created and stored in a link database, for comparison with other processed conversations. A corpus, a transcription of the conversation containing metadata links, is then created. The corpus processing agent also determines the frequency of spoken keywords and phrases and compiles a popularity index.

Data processing system for autonomously building speech identification and tagging data

Data processing system for autonomously building speech identification and tagging data

Data processing system for autonomously building speech identification and tagging data

Owner:NUANCE COMM INC

Three-dimensional display of document set

InactiveUS7555496B1Data processing applicationsSpecial data processing applicationsDocumentationData science

A method for spatializing text content for enhanced visual browsing and analysis. The invention is applied to large text document corpora such as digital libraries, regulations and procedures, archived reports, and the like. The text content from these sources may be transformed to a spatial representation that preserves informational characteristics from the documents. The three-dimensional representation may then be visually browsed and analyzed in ways that avoid language processing and that reduce the analysts' effort.

Three-dimensional display of document set

Three-dimensional display of document set

Three-dimensional display of document set

Owner:BATTELLE MEMORIAL INST

Combined statistical and rule-based part-of-speech tagging for text-to-speech synthesis

InactiveUS8719006B2Natural language data processingSpecial data processing applicationsPart of speechText to speech synthesis

In response to a word of a text sequence, a first part-of-speech (POS) tag is generated using a statistical part-of-speech (POS) tagger based on a corpus of trained text sequences, each representing a likely POS of a word for a given text sequence. A second POS tag is generated using a rule-based POS tagger based on a set of one or more rules associated with a type of an application associated with the text sequence. A final POS tag is assigned to the word of the text sequence for TTS synthesis based on the first POS tag and the second POS tag.

Combined statistical and rule-based part-of-speech tagging for text-to-speech synthesis

Combined statistical and rule-based part-of-speech tagging for text-to-speech synthesis

Combined statistical and rule-based part-of-speech tagging for text-to-speech synthesis

Owner:APPLE INC

Disambiguating user intent in conversational interaction system for large corpus information retrieval

ActiveUS20140040274A1Database queryingDigital data processing detailsInteraction systemsAmbiguity

A method of disambiguating user intent in conversational interactions for information retrieval is disclosed. The method includes providing access to a set of content items with metadata describing the content items and providing access to structural knowledge showing semantic relationships and links among the content items. The method further includes providing a user preference signature, receiving a first input from the user that is intended by the user to identify at least one desired content item, and determining an ambiguity index of the first input. If the ambiguity index is high, the method determines a query input based on the first input and at least one of the structural knowledge, the user preference signature, a location of the user, and the time of the first input and selects a content item based on comparing the query input and the metadata associated with the content item.

Disambiguating user intent in conversational interaction system for large corpus information retrieval

Disambiguating user intent in conversational interaction system for large corpus information retrieval

Disambiguating user intent in conversational interaction system for large corpus information retrieval

Owner:VEVEO INC

System and method for providing question and answers with deferred type evaluation

ActiveUS20090292687A1ConfidenceAllow useDigital data information retrievalDigital data processing detailsQuestions and answersData mining

A system, method and computer program product for conducting questions and answers with deferred type evaluation based on any corpus of data. The method includes processing a query including waiting until a “Type” (i.e. a descriptor) is determined AND a candidate answer is provided; the Type is not required as part of a predetermined ontology but is only a lexical / grammatical item. Then, a search is conducted to look (search) for evidence that the candidate answer has the required LAT (e.g., as determined by a matching function that can leverage a parser, a semantic interpreter and / or a simple pattern matcher). In another embodiment, it may be attempted to match the LAT to a known Ontological Type and then look for a candidate answer up in an appropriate knowledge-base, database, and the like determined by that type. Then, all the evidence from all the different ways to determine that the candidate answer has the expected lexical answer type (LAT) is combined and one or more answers are delivered to a user.

System and method for providing question and answers with deferred type evaluation

System and method for providing question and answers with deferred type evaluation

System and method for providing question and answers with deferred type evaluation

Owner:IBM CORP

Part-of-speech tagging using latent analogy

InactiveUS20090089058A1Semantic analysisSpeech recognitionPart of speechVector space model

Methods and apparatuses to assign part-of-speech tags to words are described. An input sequence of words is received. A global fabric of a corpus having training sequences of words may be analyzed in a vector space. A global semantic information associated with the input sequence of words may be extracted based on the analyzing. A part-of-speech tag may be assigned to a word of the input sequence based on POS tags from pertinent words in relevant training sequences identified using the global semantic information. The input sequence may be mapped into a vector space. A neighborhood associated with the input sequence may be formed in the vector space wherein the neighborhood represents one or more training sequences that are globally relevant to the input sequence.

Part-of-speech tagging using latent analogy

Part-of-speech tagging using latent analogy

Part-of-speech tagging using latent analogy

Owner:APPLE INC

Method and system for analyzing text

InactiveUS20110208511A1Semantic analysisForecastingSemantic spaceData mining

An apparatus for providing a control input signal for an industrial process or technical system having one or more controllable elements includes elements for generating a semantic space for a text corpus, and elements for generating a norm from one or more reference words or texts, the or each reference word or text being associated with a defined respective value on a scale, and the norm being calculated as a reference point or set of reference points in the semantic space for the or each reference word or text with its associated respective scale value. Elements for reading at least one target word included in the text corpus, elements for predicting a value of a variable associated with the target word based on the semantic space and the norm, and elements for providing the predicted value in a control input signal to the industrial process or technical system. A method for predicting a value of a variable associated with a target word is also disclosed together with an associated system and computer readable medium.

Method and system for analyzing text

Method and system for analyzing text

Method and system for analyzing text

Owner:STROSSLE INT

Systems and methods for collecting user annotations

ActiveUS20050216457A1Enhance and personalize searchEnhance and personalize and browsing operationData processing applicationsWeb data indexingPaper documentDocument preparation

Computer systems and methods allow users to annotate content items found in a corpus such as the World Wide Web. Annotations, which can include any descriptive and / or evaluative metadata related to a document, are collected from a user and stored in association with that user. Users are able to annotate and view their annotations for any document they encounter while interacting with the corpus, including hits returned in a search of the corpus. Users are also able to search their annotations or to limit searches to documents they have annotated. Metadata from annotations can also be aggregated across users and aggregated metadata applied in generating search results.

Systems and methods for collecting user annotations

Systems and methods for collecting user annotations

Systems and methods for collecting user annotations

Owner:R2 SOLUTIONS

Apparatus and method for building domain-specific language models

InactiveUS6188976B1Satisfactory qualitySpeech recognitionSpecial data processing applicationsA domainMixture modeling

Disclosed is a method and apparatus for building a domain-specific language model for use in language processing applications, e.g., speech recognition. A reference language model is generated based on a relatively small seed corpus containing linguistic units relevant to the domain. An external corpus containing a large number of linguistic units is accessed. Using the reference language model, linguistic units which have a sufficient degree of relevance to the domain are extracted from the external corpus. The reference language model is then updated based on the seed corpus and the extracted linguistic units. The process may be repeated iteratively until the language model is of satisfactory quality. The language building technique may be further enhanced by combining it with mixture modeling or class-based modeling.

Apparatus and method for building domain-specific language models

Apparatus and method for building domain-specific language models

Apparatus and method for building domain-specific language models

Owner:NUANCE COMM INC

Converting text-to-speech and adjusting corpus

ActiveUS20080270139A1Improve voice qualitySpeech recognitionSpeech synthesisSpeech soundText to speech conversion

The present invention provides a method and apparatus for text to speech conversion, and a method and apparatus for adjusting a corpus. The method for text to speech comprises: text analysis step for parsing the text to obtain descriptive prosody annotations of the text based on a TTS model generated from a first corpus; prosody parameter prediction step for predicting the prosody parameter of the text according to the result of text analysis step; speech synthesis step for synthesizing speech of said text based on said the prosody parameter of the text; wherein descriptive prosody annotations of the text include prosody structure for the text, the prosody structure of the text is adjusted according to a target speech speed for the synthesized speech. The present invention adjusts the prosody structure of the text according to the target speech speed. The synthesized speech will have improved quality.

Converting text-to-speech and adjusting corpus

Converting text-to-speech and adjusting corpus

Converting text-to-speech and adjusting corpus

Owner:CERENCE OPERATING CO

Word boundary probability estimating, probabilistic language model building, kana-kanji converting, and unknown word model building

InactiveUS20080228463A1Accuracy of recognitionImprove abilitiesNatural language translationSpeech recognitionCorpus restiformeWord model

Calculates a word n-gram probability with high accuracy in a situation where a first corpus), which is a relatively small corpus containing manually segmented word information, and a second corpus, which is a relatively large corpus, are given as a training corpus that is storage containing vast quantities of sample sentences. Vocabulary including contextual information is expanded from words occurring in first corpus of relatively small size to words occurring in second corpus of relatively large size by using a word n-gram probability estimated from an unknown word model and the raw corpus. The first corpus (word-segmented) is used for calculating n-grams and the probability that the word boundary between two adjacent characters will be the boundary of two words (segmentation probability). The second corpus (word-unsegmented), in which probabilistic word boundaries are assigned based on information in the first corpus (word-segmented), is used for calculating a word n-grams.

Word boundary probability estimating, probabilistic language model building, kana-kanji converting, and unknown word model building

Word boundary probability estimating, probabilistic language model building, kana-kanji converting, and unknown word model building

Word boundary probability estimating, probabilistic language model building, kana-kanji converting, and unknown word model building

Owner:INT BUSINESS MASCH CORP

System and method for hybrid speech synthesis

ActiveUS20080270140A1Cost-efficientlySolve the real problemSpeech synthesisSpeech corpusSpeech sound

A speech synthesis system receives symbolic input describing an utterance to be synthesized. In one embodiment, different portions of the utterance are constructed from different sources, one of which is a speech corpus recorded from a human speaker whose voice is to be modeled. The other sources may include other human speech corpora or speech produced using Rule-Based Speech Synthesis (RBSS). At least some portions of the utterance may be constructed by modifying prototype speech units to produce adapted speech units that are contextually appropriate for the utterance. The system concatenates the adapted speech units with the other speech units to produce a speech waveform. In another embodiment, a speech unit of a speech corpus recorded from a human speaker lacks transitions at one or both of its edges. A transition is synthesized using RBSS and concatenated with the speech unit in producing a speech waveform for the utterance.

System and method for hybrid speech synthesis

System and method for hybrid speech synthesis

System and method for hybrid speech synthesis

Owner:NOVASPEECH

Query translation through dictionary adaptation

ActiveUS8775154B2Digital data information retrievalDigital data processing detailsCross-language information retrievalCross lingual

Cross-lingual information retrieval is disclosed, comprising: translating a received query from a source natural language into a target natural language; performing a first information retrieval operation on a corpus of documents in the target natural language using the translated query to retrieve a set of pseudo-feedback documents in the target natural language; re-translating the received query from the source natural language into the target natural language using a translation model derived from the set of pseudo-feedback documents in the target natural language; and performing a second information retrieval operation on the corpus of documents in the target natural language using the re-translated query to retrieve an updated set of documents in the target natural language.

Query translation through dictionary adaptation

Query translation through dictionary adaptation

Query translation through dictionary adaptation

Owner:CONDUENT BUSINESS SERVICES LLC

Large Scale Distributed Syntactic, Semantic and Lexical Language Models

InactiveUS20130325436A1Semantic analysisSpeech recognitionExpectation–maximization algorithmModel parameters

A composite language model may include a composite word predictor. The composite word predictor may include a first language model and a second language model that are combined according to a directed Markov random field. The composite word predictor can predict a next word based upon a first set of contexts and a second set of contexts. The first language model may include a first word predictor that is dependent upon the first set of contexts. The second language model may include a second word predictor that is dependent upon the second set of contexts. Composite model parameters can be determined by multiple iterations of a convergent N-best list approximate Expectation-Maximization algorithm and a follow-up Expectation-Maximization algorithm applied in sequence, wherein the convergent N-best list approximate Expectation-Maximization algorithm and the follow-up Expectation-Maximization algorithm extracts the first set of contexts and the second set of contexts from a training corpus.

Large Scale Distributed Syntactic, Semantic and Lexical Language Models

Large Scale Distributed Syntactic, Semantic and Lexical Language Models

Large Scale Distributed Syntactic, Semantic and Lexical Language Models

Owner:WRIGHT STATE UNIVERSITY

Ranking search results by reranking the results based on local inter-connectivity

InactiveUS6725259B1Data processing applicationsWeb data indexingInterconnectivityDocument preparation

A search engine for searching a corpus improves the relevancy of the results by refining a standard relevancy score based on the interconnectivity of the initially returned set of documents. The search engine obtains an initial set of relevant documents by matching a user's search terms to an index of a corpus. A re-ranking component in the search engine then refines the initially returned document rankings so that documents that are frequently cited in the initial set of relevant documents are preferred over documents that are less frequently cited within the initial set.

Ranking search results by reranking the results based on local inter-connectivity

Ranking search results by reranking the results based on local inter-connectivity

Ranking search results by reranking the results based on local inter-connectivity

Owner:GOOGLE LLC

System and method for suggestion mining

ActiveUS20130096909A1Natural language data processingSpecial data processing applicationsGrammatical relationDocument preparation

A system and method for extraction of suggestions for improvement form a corpus of documents, such as customer reviews, are disclosed. A structured terminology provided or a topic includes a set of semantic classes, each including a set of terms. A thesaurus of terms relating to suggestions of improvement is provided. Text elements of text strings in the documents which are instances of terms in the structured terminology are labeled with the corresponding semantic class and text elements which are instances of terms in the thesaurus are also labeled. A set of patterns is applied to the labeled text strings to identify suggestions of improvement expressions. The patterns define syntactic relations between text elements, some of which are required to be instances of one of the terms in a particular semantic class or thesaurus. A set of suggestions for improvements is output based on the identified suggestions of improvement expressions.

System and method for suggestion mining

System and method for suggestion mining

System and method for suggestion mining

Owner:XEROX CORP

Information data retrieval, where the data is organized in terms, documents and document corpora

ActiveUS20050149494A1Easy to solveEasy to FeedbackDigital data processing detailsObject oriented databasesPaper documentDocument preparation

The invention relates to improved solutions for information retrieval, wherein the information is represented by digitized text data. This data is further presumed to be organized in terms (431-438), documents and document corpora, where each document contains at least one term (431-438) and each document corpus contains at least one document. Based on a concept vector (420-424), which conceptually classifies the contents of each document, a term-to-concept vector is generated for each term (431-438) in the document corpus. The term-to-concept vector describes a relationship between the term (431) and each of the concept vectors (420-424). On basis of the term-to-concept vectors for the document corpus, a term-term matrix is generated which describes a term-to-term relationship between all the terms (431-438) in the document corpus. The term-term matrix may then be processed and used for retrieving information from the document corpus, such as the fact that a first term (431) is related to a second term (436).

Information data retrieval, where the data is organized in terms, documents and document corpora

Information data retrieval, where the data is organized in terms, documents and document corpora

Information data retrieval, where the data is organized in terms, documents and document corpora

Owner:ELUCIDON GROUP

Search processing with automatic categorization of queries

ActiveUS20060122979A1Data processing applicationsDigital data information retrievalConcept networkLearning set

Search results are processed using search requests, including analyzing received queries in order to provide a more sophisticated understanding of the information being sought. A concept network is generated from a set of queries by parsing the queries into units and defining various relationships between the units. From these concept networks, queries can be automatically categorized into categories, or more generally, can be associated with one or more nodes of a taxonomy. The categorization can be used to alter the search results or the presentation of the results to the user. As an example of alterations of search results or presentation, the presentation might include a list of “suggestions” for related search query terms. As other examples, the corpus searched might vary depending on the category or the ordering or selection of the results to present to the user might vary depending on the category. Categorization might be done using a learned set of query-node pairs where a pair maps a particular query to a particular node in the taxonomy. The learned set might be initialized from a manual indication of which queries go with which nodes and enhanced has more searches are performed. One method of enhancement involves tracking post-query click activity to identify how a category estimate of a query might have varied from an actual category for the query as evidenced by the category of the post-query click activity, e.g., a particular hits of the search results that the user selected following the query. Another method involved determining relationships between units in the form of clusters and using clustering to modify the query-node pairs.

Search processing with automatic categorization of queries

Search processing with automatic categorization of queries

Search processing with automatic categorization of queries

Owner:R2 SOLUTIONS

System and method for dynamically evaluating latent concepts in unstructured documents

InactiveUS6978274B1Database updatingData processing applicationsFrequency of occurrenceDocumentation

A system and method for dynamically evaluating latent concepts in unstructured documents is disclosed. A multiplicity of concepts are extracted from a set of unstructured documents into a lexicon. The lexicon uniquely identifies each concept and a frequency of occurrence. A frequency of occurrence representation is created for the documents set. The frequency representation provides an ordered corpus of the frequencies of occurrence of each concept. A subset of concepts is selected from the frequency of occurrence representation filtered against a pre-defined threshold. A group of weighted clusters of concepts selected from the concepts subset is generated. A matrix of best fit approximations is determined for each document weighted against each group of weighted clusters of concepts.

System and method for dynamically evaluating latent concepts in unstructured documents

System and method for dynamically evaluating latent concepts in unstructured documents

System and method for dynamically evaluating latent concepts in unstructured documents

Owner:NUIX NORTH AMERICA

Popular searches

Singular value Matrix form Text corpus Data library Search terms Re ranking Web page Document retrieval Semantic relation Electronic form