Patents
Literature
Hiro is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Hiro

18021results about "Text database clustering/classification" patented technology

Methods and apparatus for serving relevant advertisements

The relevance of advertisements to a user's interests is improved. In one implementation, the content of a web page is analyzed to determine a list of one or more topics associated with that web page. An advertisement is considered to be relevant to that web page if it is associated with keywords belonging to the list of one or more topics. One or more of these relevant advertisements may be provided for rendering in conjunction with the web page or related web pages.
Owner:GOOGLE LLC

Real-time typing assistance

An apparatus and method are disclosed for providing feedback and guidance to touch screen device users to improve the text entry user experience and performance through the use of indicators such as feedback semaphores. Also disclosed are suggestion candidates, which allow a user to quickly select next words to add to text input data, or replacement words for words that have been designated as incorrect. According to one embodiment, a method comprises receiving text input data, providing an indicator for possible correction of the text input data, displaying suggestion candidates associated with alternative words for the data, receiving a single touch screen input selecting one of the suggestion candidates, and modifying the input data using the word associated with the selected suggestion candidate.
Owner:MICROSOFT TECH LICENSING LLC

Search query processing to provide category-ranked presentation of search results

A search engine system displays the results of a multiple-category search according to levels of relevance of the categories to a user's search query. A query server receives a search query from a user and identifies, within each of multiple item categories, a set of items that satisfy the query. The sets of items are used to generate, for each of the multiple categories, a score that reflects a level significance or relevance of the category to the search. The scores may be based, for example, on the number of hits within each category relative to the total number of items in that category, the popularity levels of items that satisfy the query, a personal profile of the user, or a combination thereof. The categories are then presented to the user, together with the most relevant items within each category, in the order of highest to lowest category relevance.
Owner:A9 COM INC

Computer-implemented system and method for text-based document processing

A computer-implemented system and method for processing text-based documents. A frequency of terms data set is generated for the terms appearing in the documents. Singular value decomposition is performed upon the frequency of terms data set in order to form projections of the terms and documents into a reduced dimensional subspace. The projections are normalized, and the normalized projections are used to analyze the documents.
Owner:SAS INSTITUTE

Method, Apparatus and Program Storage Device For Providing Customizable, Immediate and Radiating Menus For Accessing Applications and Actions

A method, apparatus and program storage device for providing customizable, immediate and radiating menus for accessing applications and actions. Upon initiation of a predetermined user action, such as a right-click operation, a primary menu is displayed and a second radial menu is displayed proximate the primary menu with the cursor position at a predetermined location for minimizing cursor manipulation for selecting a menu item from the second radial menu.
Owner:IBM CORP

Method for data and text mining and literature-based discovery

Text searching is achieved by techniques including phrase frequency analysis and phrase-co-occurrence analysis. In many cases, factor matrix analysis is also advantageously applied to select high technical content phrases to be analyzed for possible inclusion within a new query. The described techniques may be used to retrieve data, determine levels of emphasis within a collection of data, determine the desirability of conflating search terms, detect symmetry or asymmetry between two text elements within a collection of documents, generate a taxonomy of documents within a collection, and perform literature-based problem solving. (This abstract is intended only to aid those searching patents, and is not intended to limit the disclosure of claims in any manner.)
Owner:NAVY UNITED STATES OF AMERICA AS REPRESENTED BY THE SECY OF THE

Computer system, method, and program product for generating a data structure for information retrieval, and an associated graphical user interface

A computer system for generating data structures for information retrieval of documents stored in a database. The computer system includes: a neighborhood patch generation system for defining patch of nodes having predetermined similarities in a hierarchy structure. The neighborhood patch generation subsystem includes a hierarchy generation subsystem for generating a hierarchy structure upon the document-keyword vectors and a patch definition subsystem. The computer system also comprises a cluster estimation subsystem for generating cluster data of the document-keyword vectors using the similarities of patches.
Owner:IBM CORP

Methods for generating natural language processing systems

Methods are presented for generating a natural language model. The method may comprise: ingesting training data representative of documents to be analyzed by the natural language model, generating a hierarchical data structure comprising at least two topical nodes within which the training data is to be subdivided into by the natural language model, selecting a plurality of documents among the training data to be annotated, generating an annotation prompt for each document configured to elicit an annotation about said document indicating which node among the at least two topical nodes said document is to be classified into, receiving the annotation based on the annotation prompt; and generating the natural language model using an adaptive machine learning process configured to determine patterns among the annotations for how the documents in the training data are to be subdivided according to the at least two topical nodes of the hierarchical data structure.
Owner:100 CO GLOBAL HLDG LLC

Confidently adding snippets of search results to clusters of objects

Systems and methods are provided for matching snippets of search results to clusters of objects. A system adds a data snippet of a search result to a cluster of objects. The system calculates a confidence score for the add based on the recency, a job title, an email address, and / or a phone number associated with the data snippet. The system stores the add in the customer accessible database if the confidence score is sufficiently high for the add to be stored in the customer accessible database. The system generates a notice for review if the confidence score is not sufficiently high for the add to be stored in the customer accessible database.
Owner:SALESFORCE COM INC

Conceptual world representation natural language understanding system and method

A Natural Language Understanding system is provided for indexing of free text documents. The system according to the invention utilizes typographical and functional segmentation of text to identify those portions of free text that carry meaning. The system then uses words and multi-word terms and phrases identified in the free to text to identify concepts in the free text. The system uses a lexicon of terms linked to a formal ontology that is independent of a specific language to extract concepts from the free text based on the words and multi-word terms in the free text. The formal ontology contains both language independent domain knowledge concepts and language dependent linguistic concepts that govern the relationships between concepts and contain the rules about how language works. The system according to the current invention may preferably be used to index medical documents and assign codes from independent coding systems, such as, SNOMED, ICD-9 and ICD-10. The system according to the current invention may also preferably make use of syntactic parsing to improve the efficiency of the method.
Owner:NUANCE COMM INC

Search engine that applies feedback from users to improve search results

The present invention is directed to methods of and systems for ranking results returned by a search engine. A method in accordance with the invention comprises determining a formula having variables and parameters, wherein the formula is for computing a relevance score for a document and a search query; and ranking the document based on the relevance score. Preferably, determining the formula comprises tuning the parameters based on user input. Preferably, the parameters are determined using a machine learning technique, such as one that includes a form of statistical classification.
Owner:PINTEREST

System and method for automatic document management

A system for managing documents, comprising: interfaces to a user interface, proving an application programming interface, a database of document images, a remote server, configured to communicate a text representation of the document from the optical character recognition engine to the report server, and to receive from the remote server a classification of the document; and logic configured to receive commands from the user interface, and to apply the classifications received from the remote server to the document images through the interface to the database. A corresponding method is also provided.
Owner:AUTOFILE INC

Techniques for similarity analysis and data enrichment using knowledge sources

The present disclosure relates to performing similarity metric analysis and data enrichment using knowledge sources. A data enrichment service can compare an input data set to reference data sets stored in a knowledge source to identify similarly related data. A similarity metric can be calculated corresponding to the semantic similarity of two or more datasets. The similarity metric can be used to identify datasets based on their metadata attributes and data values enabling easier indexing and high performance retrieval of data values. A input data set can labeled with a category based on the data set having the best match with the input data set. The similarity of an input data set with a data set provided by a knowledge source can be used to query a knowledge source to obtain additional information about the data set. The additional information can be used to provide recommendations to the user.
Owner:ORACLE INT CORP

System and method for geographically organizing and classifying businesses on the world-wide web

A method and search engine for classifying a source publishing a document on a portion of a network, includes steps of electronically receiving a document, based on the document, determining a source which published the document, and assigning a code to the document based on whether data associated with the document published by the source matches with data contained in a database. An intelligent geographic- and business topic-specific resource discovery system facilitates local commerce on the World-Wide Web and also reduces search time by accurately isolating information for end-users. Distinguishing and classifying business pages on the Web by business categories using Standard Industrial Classification (SIC) codes is achieved through an automatic iterative process.
Owner:META PLATFORMS INC

Object detection and detection confidence suitable for autonomous driving

In various examples, detected object data representative of locations of detected objects in a field of view may be determined. One or more clusters of the detected objects may be generated based at least in part on the locations and features of the cluster may be determined for use as inputs to a machine learning model(s). A confidence score, computed by the machine learning model(s) based at least in part on the inputs, may be received, where the confidence score may be representative of a probability that the cluster corresponds to an object depicted at least partially in the field of view. Further examples provide approaches for determining ground truth data for training object detectors, such as for determining coverage values for ground truth objects using associated shapes, and for determining soft coverage values for ground truth objects.
Owner:NVIDIA CORP

Method and system for naming a cluster of words and phrases

The present invention provides a method, system and computer program for naming a cluster, or a hierarchy of clusters, of words and phrases that have been extracted from a set of documents. The invention takes these clusters as the input and generates appropriate labels for the clusters using a lexical database. Naming involves first finding out all possible word senses for all the words in the cluster, using the lexical database; and then augmenting each word sense with words that are semantically similar to that word sense to form respective definition vectors. Thereafter, word sense disambiguation is done to find out the most relevant sense for each word. Definition vectors are clustered into groups. Each group represents a concept. These concepts are thereafter ranked based on their support. Finally, a pre-specified number of words and phrases from the definition vectors of the dominant concepts are selected as labels, based on their generality in the lexical database. Therefore, the labels may not necessarily consist of the original words in the cluster. A hierarchy of clusters is named in a recursive fashion starting from leaf clusters. Dominant concepts in child clusters are propagated into their parent to reduce the labeling complexity of parent clusters.
Owner:MICRO FOCUS LLC

System and method for performing efficient document scoring and clustering

A system and method for providing efficient document scoring of concepts within a document set is described. A frequency of occurrence of at least one concept within a document retrieved from the document set is determined. A concept weight is analyzed reflecting a specificity of meaning for the at least one concept within the document. A structural weight is analyzed reflecting a degree of significance based on structural location within the document for the at least one concept. A corpus weight is analyzed inversely weighing a reference count of occurrences for the at least one concept within the document. A score associated with the at least one concept is evaluated as a function of the frequency, concept weight, structural weight, and corpus weight.
Owner:NUIX NORTH AMERICA

System and method for dynamically evaluating latent concepts in unstructured documents

A system and method for dynamically evaluating latent concepts in unstructured documents is disclosed. A multiplicity of concepts are extracted from a set of unstructured documents into a lexicon. The lexicon uniquely identifies each concept and a frequency of occurrence. A frequency of occurrence representation is created for the documents set. The frequency representation provides an ordered corpus of the frequencies of occurrence of each concept. A subset of concepts is selected from the frequency of occurrence representation filtered against a pre-defined threshold. A group of weighted clusters of concepts selected from the concepts subset is generated. A matrix of best fit approximations is determined for each document weighted against each group of weighted clusters of concepts.
Owner:NUIX NORTH AMERICA

System And Method For Displaying Relationships Between Electronically Stored Information To Provide Classification Suggestions Via Inclusion

A system and for providing reference documents as a suggestion for classifying uncoded documents is provided. A set of reference electronically stored information items, each associated with a classification code, is designated. One or more of the reference electronically stored information items is combined with a set of uncoded electronically stored information items. Clusters of the uncoded electronically stored information items and the one or more reference electronically stored information items are generated. Relationships between the uncoded electronically stored information items and the one or more reference electronically stored information items in at least one cluster are visually depicted as suggestions for classifying the uncoded electronically stored information items in that cluster.
Owner:NUIX NORTH AMERICA

Discovering terms using statistical corpus analysis

InactiveUS20160117386A1Semantic analysisDigital data processing detailsSoftwareCorpus analysis
Software that extracts contextually relevant terms from a text sample (or corpus) by performing the following steps: (i) identifying a first term from a corpus, based, at least in part, on a set of initial contextual characteristic(s), where each initial contextual characteristic of the set of initial contextual characteristic(s) relates to the contextual use of at least one category related term of a set of category related term(s) in the corpus; (ii) adding the first term to the set of category related term(s), thereby creating a revised set of category related term(s) and a set of first term contextual characteristic(s), where each first term contextual characteristic of the set of first term contextual characteristic(s) relates to the contextual use of the first term in the corpus; and (iii) identifying a second term from the corpus, based, at least in part, on the set of first term contextual characteristic(s).
Owner:IBM CORP

Dynamic information extraction with self-organizing evidence construction

A data analysis system with dynamic information extraction and self-organizing evidence construction finds numerous applications in information gathering and analysis, including the extraction of targeted information from voluminous textual resources. One disclosed method involves matching text with a concept map to identify evidence relations, and organizing the evidence relations into one or more evidence structures that represent the ways in which the concept map is instantiated in the evidence relations. The text may be contained in one or more documents in electronic form, and the documents may be indexed on a paragraph level of granularity. The evidence relations may self-organize into the evidence structures, with feedback provided to the user to guide the identification of evidence relations and their self-organization into evidence structures. A method of extracting information from one or more documents in electronic form includes the steps of clustering the document into clustered text; identifying patterns in the clustered text; and matching the patterns with the concept map to identify evidence relations such that the evidence relations self-organize into evidence structures that represent the ways in which the concept map is instantiated in the evidence relations.
Owner:TECHTEAM GOVERNMENT SOLUTIONS

Data driven natural language event detection and classification

ActiveUS20170357716A1Detection is relatively straightforwardNatural language data processingSound input/outputEvent typeUser device
Systems and processes for operating a digital assistant are provided. In accordance with one or more examples, a method includes, at a user device with one or more processors and memory, receiving unstructured natural language information from at least one user. The method also includes, in response to receiving the unstructured natural language information, determining whether event information is present in the unstructured natural language information. The method further includes, in accordance with a determination that event information is present within the unstructured natural language information, determining whether an agreement on an event is present in the unstructured natural language information. The method further includes, in accordance with a determination that an agreement on an event is present, determining an event type of the event and providing an event description based on the event type.
Owner:APPLE INC

Social Graph Based Recommender

Personalized sorted lists of data items for users within an online social network can be generated. Users within the social network are profiled based on their interests. Concepts are segmented in the ontological database into clusters of concepts that are shared by several user profiles. A social graph is defined in which nodes represent the users within the social network and edges represent the explicit connections between the users. A neighborhood graph for each concept cluster is defined. Multilayered social affinity graphs are defined. Data items acted upon by users within the social network in a given time interval are identified. Users within the social network that have acted upon the identified data items are determined. One or more layers of the social affinity graphs are selected for each identified item. Initial endorsement values in the nodes are injecting for each identified item. The endorsement values are propagated across the selected layers of the social affinity graphs for each identified item until some stopping criteria is met. A sorted list of items acted upon by other users is generated for each user within the social network.
Owner:CASCAAD

Suspicious message processing and incident response

The present invention relates to methods, network devices, and machine-readable media for an integrated environment for automated processing of reports of suspicious messages, and furthermore, to a network for distributing information about detected phishing attacks.
Owner:COFENSE INC

Text emotion classification method based on the joint deep learning model

The invention provides a text emotion classification method based on the joint deep learning model which relates to the text emotion classification method. The method is designed with the object of solving the problems with the dimension disaster and sparse data incurred from the existing support vector machine and other shallow layer classification methods. The method comprises: 1) processing each word in the text data; using the word2vec tool to train each processed word in the text data so as to obtain a word vector dictionary; 2) obtaining the matrix M of each sentence; training the matrix M by the LSTM layer and converting it into vector with fixed dimensions; improving the input layer; generating d-dimensional h word vectors with context semantic relations; 3) using a CNN as a trainable characteristic detector to extract characteristics from the d-dimensional h word vectors with context semantic relations; and 4) connecting the extracted characteristics in order; outputting to obtain the probability of each classification wherein the classification with the maximal probability value is the predicated classification. The invention is applied to the natural language processing field.
Owner:HARBIN INST OF TECH

Method and system for providing representative phrase

A method and system for providing a representative phrase corresponding to a real time (current time) popular keyword. The method and system may extend a representative criterion word, determined by analyzing morphemes of words in documents grouped into a cluster, and may combine the extended representative criterion word and the popular keyword, thereby providing the representative phrases. The method and system may display the popular keyword and the representative phrases on a web page, or the like.
Owner:NHN CORP
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products