Patents
Literature
Hiro is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Hiro

70 results about "Word sense" patented technology

In linguistics, a word sense is one of the meanings of a word. In each sentence we associate a different meaning of the word "play" based on hints the rest of the sentence gives us. People and computers, as they read words, must use a process called word-sense disambiguation to find the correct meaning of a word. This process uses context to narrow the possible senses down to the probable ones. The context includes such things as the ideas conveyed by adjacent words and nearby phrases, the known or probable purpose and register of the conversation or document, and the orientation (time and place) implied or expressed. The disambiguation is thus context-sensitive.

Speech-enabled language translation system and method enabling interactive user supervision of translation and speech recognition accuracy

A system and method for a highly interactive style of speech-to-speech translation is provided. The interactive procedures enable a user to recognize, and if necessary correct, errors in both speech recognition and translation, thus providing robust translation output than would otherwise be possible. The interactive techniques for monitoring and correcting word ambiguity errors during automatic translation, search, or other natural language processing tasks depend upon the correlation of Meaning Cues and their alignment with, or mapping into, the word senses of third party lexical resources, such as those of a machine translation or search lexicon. This correlation and mapping can be carried out through the creation and use of a database of Meaning Cues, i.e., SELECT. Embodiments described above permit the intelligent building and application of this database, which can be viewed as an interlingua, or language-neutral set of meaning symbols, applicable for many purposes. Innovative techniques for interactive correction of server-based speech recognition are also described.
Owner:ZAMA INNOVATIONS LLC

Word sense disambiguation

Systems and methods for word sense disambiguation, including discerning one or more senses or occurrences, distinguishing between senses or occurrences, and determining a meaning for a sense or occurrence of a subject term. In a collection of documents containing terms and a reference collection containing at least one meaning associated with a term, the method includes forming a vector space representation of terms and documents. In some embodiments, the vector space is a latent semantic index vector space. In some embodiments, occurrences are clustered to discern or distinguish a sense of a term. In preferred embodiments, meaning of a sense or occurrence is assigned based on either correlation with an external reference source, or proximity to a reference source that has been indexed into the space.
Owner:RELATIVITY ODA LLC

Method and system for naming a cluster of words and phrases

The present invention provides a method, system and computer program for naming a cluster, or a hierarchy of clusters, of words and phrases that have been extracted from a set of documents. The invention takes these clusters as the input and generates appropriate labels for the clusters using a lexical database. Naming involves first finding out all possible word senses for all the words in the cluster, using the lexical database; and then augmenting each word sense with words that are semantically similar to that word sense to form respective definition vectors. Thereafter, word sense disambiguation is done to find out the most relevant sense for each word. Definition vectors are clustered into groups. Each group represents a concept. These concepts are thereafter ranked based on their support. Finally, a pre-specified number of words and phrases from the definition vectors of the dominant concepts are selected as labels, based on their generality in the lexical database. Therefore, the labels may not necessarily consist of the original words in the cluster. A hierarchy of clusters is named in a recursive fashion starting from leaf clusters. Dominant concepts in child clusters are propagated into their parent to reduce the labeling complexity of parent clusters.
Owner:MICRO FOCUS LLC

Internet searching using semantic disambiguation and expansion

The invention provides a system and a method of searching for information in a database using a query. In the method, it comprises the steps of: disambiguating the query to identify keyword senses associated with the query; disambiguating information in the database according to the keyword senses; indexing the information in the database according to the keyword senses; expanding the keyword senses to include relevant semantic synonyms for the keyword senses to create a list of expanded keyword senses; searching the database to find relevant information for the query using the expanded keyword senses; and providing search results of the included information containing the keyword senses and other semantically related words senses. The system comprises modules which disambiguate queries and information and indexes the information in a database of word senses.
Owner:IDILIA

Method and system for naming a cluster of words and phrases

The present invention provides a method, system and computer program for naming a cluster, or a hierarchy of clusters, of words and phrases that have been extracted from a set of documents. The invention takes these clusters as the input and generates appropriate labels for the clusters using a lexical database. Naming involves first finding out all possible word senses for all the words in the cluster, using the lexical database; and then augmenting each word sense with words that are semantically similar to that word sense to form respective definition vectors. Thereafter, word sense disambiguation is done to find out the most relevant sense for each word. Definition vectors are clustered into groups. Each group represents a concept. These concepts are thereafter ranked based on their support. Finally, a pre-specified number of words and phrases from the definition vectors of the dominant concepts are selected as labels, based on their generality in the lexical database. Therefore, the labels may not necessarily consist of the original words in the cluster. A hierarchy of clusters is named in a recursive fashion starting from leaf clusters. Dominant concepts in child clusters are propagated into their parent to reduce the labeling complexity of parent clusters.
Owner:MICRO FOCUS LLC

Method and apparatus for cross-lingual communication

A system and method for a highly interactive style of speech-to-speech translation is provided. The interactive procedures enable a user to recognize, and if necessary correct, errors in both speech recognition and translation, thus providing robust translation output than would otherwise be possible. The interactive techniques for monitoring and correcting word ambiguity errors during automatic translation, search, or other natural language processing tasks depend upon the correlation of Meaning Cues and their alignment with, or mapping into, the word senses of third party lexical resources, such as those of a machine translation or search lexicon. This correlation and mapping can be carried out through the creation and use of a database of Meaning Cues, i.e., SELECT. Embodiments described above permit the intelligent building and application of this database, which can be viewed as an interlingua, or language-neutral set of meaning symbols, applicable for many purposes. Innovative techniques for interactive correction of server-based speech recognition are also described.
Owner:ZAMA INNOVATIONS LLC

Word sense disambiguation

Systems and methods for word sense disambiguation, including discerning one or more senses or occurrences, distinguishing between senses or occurrences, and determining a meaning for a sense or occurrence of a subject term. In a collection of documents containing terms and a reference collection containing at least one meaning associated with a term, the method includes forming a vector space representation of terms and documents. In some embodiments, the vector space is a latent semantic index vector space. In some embodiments, occurrences are clustered to discern or distinguish a sense of a term. In preferred embodiments, meaning of a sense or occurrence is assigned based on either correlation with an external reference source, or proximity to a reference source that has been indexed into the space.
Owner:RELATIVITY ODA LLC

Method and device for determining suggest word

The application discloses a method and a device for determining a suggest word. In the method and the device, the relevance between a candidate word and an inquired word and the relevance between the candidate word and an interested field of a user are considered comprehensively according to the relevance of word characteristics and category characteristic, and further the candidate word with relatively high relevance with the inquired word and the interested field of the user is selected as the suggest word, so that the finally obtained suggest word is highly relevant to the inquired word and the interesting of the user in the meaning of the word and the category of the word; when the suggest word is determined according to the same inquired word of different users, the interesting points of the users can be distinguished effectively, so that the suggest word capable of reflecting the demands of the user can be determined finally; and meanwhile, the relevance of word category is considered when the suggest word is determined, so that even if the inquired word has various meanings in different fields, the suggest word can be determined accurately according to the interested fields of the user. According to the method and the device, the workload in the suggest word determining process can be reduced effectively, and the suggest word determining efficiency is improved.
Owner:ALIBABA GRP HLDG LTD

Document clustering methods, document cluster label disambiguation methods, document clustering apparatuses, and articles of manufacture

Document clustering methods, document cluster label disambiguation methods, document clustering apparatuses, and articles of manufacture are described. In one aspect, a document clustering method includes providing a document set comprising a plurality of documents, providing a cluster comprising a subset of the documents of the document set, using a plurality of terms of the documents, providing a cluster label indicative of subject matter content of the documents of the cluster, wherein the cluster label comprises a plurality of word senses, and selecting one of the word senses of the cluster label.
Owner:BATTELLE MEMORIAL INST

Word segmentation algorithm-based log parsing method and word segmentation algorithm-based log parsing system

The invention relates to the technical field of log audit and safety management, and aims at providing a word segmentation algorithm-based log parsing method and a word segmentation algorithm-based log parsing system. The word segmentation algorithm-based log parsing method comprises the following steps: performing segmentation on a log, performing word sense analysis on segmentation results, performing word sense filtration on obtained segmentation results with word sense tagging, performing feature extraction on the obtained filtered segmentation results with the word sense tagging, performing feature matching on obtained word sense order feature codes, and performing semantic parsing on obtained semantic parsing rules; the word segmentation algorithm-based log parsing system comprises a segmentation module, a word sense analysis module, a word sense filtration module, a word order feature extraction module, a feature matching module and a semantic parsing module. According to the word segmentation algorithm-based log parsing method and the word segmentation algorithm-based log parsing system disclosed by the invention, the difficulty and complexity of log parsing are greatly reduced, and therefore the efficiency of performing parsing rule development on the log is increased; the word segmentation algorithm-based log parsing method and the word segmentation algorithm-based log parsing system can be better adapted to certain changes of a log format.
Owner:HANGZHOU ANHENG INFORMATION TECH CO LTD

Word Sense Disambiguation Using Emergent Categories

Disclosed herein is a computer implemented method and system for word sense disambiguation in a natural language sentence. The natural language sentence is parsed for identifying possible parts of speech for each term and identifying possible phrase structures. Terms comprising one or more linguistic roles are identified. The possible sense combinations for the terms with linguistic roles are identified. Emergent categories are applied to identify possible valid senses for each of the terms with identified linguistic roles. Linguistic role pairs are identified from among the terms identified with linguistic roles. The correspondence functions with the correspondence function types matching the identified linguistic role pairs are identified from an emergent categories database. The pair-wise senses for each term are compared with the identified linguistic roles to identify the possible sense combinations. The possible senses are inferred for each term with identified linguistic roles in the natural language sentence and previous sentences.
Owner:TRIGENT SOFTWARE

Word sense disambiguation method and system

The invention provides word sense disambiguation method and system. In the method, semantic information contained in a body or a classification system with a hierarchical structure is used for carrying out semantic disambiguation on a target word. The method comprises the following steps of: inputting a target word w with various word senses (w1, w2 to wn); extracting the concept of the target word from the related body and a concept context in the body; grading the various word senses of the target word based on the concept context; and selecting a proper word sense of the target word according to a grading result. According to the invention, because various related semantic features in the context (the concept context) of the target word are considered, the accuracy rate of the word sense disambiguation is obviously improved.
Owner:NEC (CHINA) CO LTD

Method and apparatus for cross-lingual communication

A system and method for a highly interactive style of speech-to-speech translation is provided. The interactive procedures enable a user to recognize, and if necessary correct, errors in both speech recognition and translation, thus providing robust translation output than would otherwise be possible. The interactive techniques for monitoring and correcting word ambiguity errors during automatic translation, search, or other natural language processing tasks depend upon the correlation of Meaning Cues and their alignment with, or mapping into, the word senses of third party lexical resources, such as those of a machine translation or search lexicon. This correlation and mapping can be carried out through the creation and use of a database of Meaning Cues, i.e., SELECT. Embodiments described above permit the intelligent building and application of this database, which can be viewed as an interlingua, or language-neutral set of meaning symbols, applicable for many purposes. Innovative techniques for interactive correction of server-based speech recognition are also described.
Owner:ZAMA INNOVATIONS LLC

Emoji word sense disambiguation

The present disclosure generally relates to systems and processes for emoji word sense disambiguation. In one example process, a word sequence is received. A word-level feature representation is determined for each word of the word sequence and a global semantic representation for the word sequence is determined. For a first word of the word sequence, an attention coefficient is determined based on a congruence between the word-level feature representation of the first word and the global semantic representation for the word sequence. The word-level feature representation of the first word is adjusted based on the attention coefficient. An emoji likelihood is determined based on the adjusted word-level feature representation of the first word. In accordance with the emoji likelihood satisfying one or more criteria, an emoji character corresponding to the first word is presented for display.
Owner:APPLE INC

Word sense disambiguation method based on hidden Markov model

A word sense disambiguation method based on a hidden Markov model includes step 1) of training corpus, using a SemEval 2007#task5 test corpus set to parse a to-be-disambiguated sentence; performing word segmentation on the sentence; step 2) of finding ambiguous words in the sentence after the word segmentation, extracting a target ambiguous word and segmented words on the left and right of the target ambiguous word; training corpora, and calculating semantic class vocabulary transition probability and semantic class transition probability; step 3) of extracting the number of sentences containing the ambiguous word from manually annotated corpora, calculating the observation probability, and calculating the observation probability of words on the left and right of the ambiguity word; step 4) of using the value trained by the previous corpus to calculate the state transition probability, the extracted initial state probability, the observation probability and the state transition probability as parameters of the hidden Markov model, and using the well constructed disambiguation model to disambiguate the sentence in the test corpus; and step 5) of using a similarity calculation methodto verify the accuracy of disambiguation results.
Owner:FOCUS TECH

Word sense disambiguation using emergent categories

Disclosed herein is a computer implemented method and system for word sense disambiguation in a natural language sentence. The natural language sentence is parsed for identifying possible parts of speech for each term and identifying possible phrase structures. Terms comprising one or more linguistic roles are identified. The possible sense combinations for the terms with linguistic roles are identified. Emergent categories are applied to identify possible valid senses for each of the terms with identified linguistic roles. Linguistic role pairs are identified from among the terms identified with linguistic roles. The correspondence functions with the correspondence function types matching the identified linguistic role pairs are identified from an emergent categories database. The pair-wise senses for each term are compared with the identified linguistic roles to identify the possible sense combinations. The possible senses are inferred for each term with identified linguistic roles in the natural language sentence and previous sentences.
Owner:TRIGENT SOFTWARE

Joint disambiguation of syntactic and semantic ambiguity

ActiveUS20110125487A1Robustly solveSemantic analysisSpeech recognitionSemantic representationSyntactic ambiguity
Ambiguities in a natural language expression are interpreted by jointly disambiguating multiple alternative syntactic and semantic interpretations. More than one syntactic alternative, represented by parse contexts, are analyzed together with joint analysis of referents, word senses, relation types, and layout of a semantic representation for each syntactic alternative. Best combinations of interpretations are selected from all participating parse contexts, and are used to form parse contexts for the next step in parsing.
Owner:POSTQ IPR OY

Multi-document subject discovery method based on two-layer clustering

InactiveCN104778204ASolve the "non-orthogonal" caseReduce the dimensionality of the eigenvector spaceSpecial data processing applicationsFeature vectorAlgorithm
The invention discloses a multi-document subject discovery method based on two-layer clustering. The multi-document subject discovery method comprises the following steps: S1 using a plurality of documents as input, pretreating each document, i.e. the documents are broken up into clauses, and the clauses are broken up into words, so as to obtain a noun group and a verb group in a multi-document group, and performing emantic disambiguation processing on polysemes in the noun group and the verb group; S2 respectively performing word clustering analysis on the noun group and the verb group which are output in the step S1 according to word similarity by adopting an improved OPTICS algorithm, extracting semantic concepts, and establishing vector space models on the clauses according to the semantic concepts; S3 performing clustering analysis on the clauses by using an improved K-medoid algorithm, so as to obtain a subject. Inner semantic relations between words are extracted by the multi-document subject discovery method, and the problem of non-orthogonality among feature items when feature vectors of the clauses are established is solved.
Owner:SOUTH CHINA UNIV OF TECH +2

Document clustering methods, document cluster label disambiguation methods, document clustering apparatuses, and articles of manufacture

Document clustering methods, document cluster label disambiguation methods, document clustering apparatuses, and articles of manufacture are described. In one aspect, a document clustering method includes providing a document set comprising a plurality of documents, providing a cluster comprising a subset of the documents of the document set, using a plurality of terms of the documents, providing a cluster label indicative of subject matter content of the documents of the cluster, wherein the cluster label comprises a plurality of word senses, and selecting one of the word senses of the cluster label.
Owner:BATTELLE MEMORIAL INST

Document search system using a meaning relation network

A system allows related documents to be retrieved using conventional search engines while overcoming the ambiguity of a search key entered by the user. The system includes a word sense associative network display portion for displaying word senses of the search key entered by the user together with related word senses in a network, a search portion for conducting a search by generating a search key based on word senses selected by the user, and a filtering portion for selecting documents from the result of the search that matches the selected word sense.
Owner:HITACHI LTD

System and method of using POS tagging for symbol assignment

Systems and methods for automatically discovering and assigning symbols for identified text in a software application include identifying text for which symbol assignment is desired. The words within the identified text and selected surrounding words defining an observation sequence are subjected to a part of speech tagging algorithm to electronically determine one or more most likely part of speech tags for the identified text. Context relations between the identified text and selected surrounding keywords may also be identified. The identified text, part of speech tag(s) and / or determined relations are then analyzed to map the identified text to one or more identified word senses. Related word senses may also be analyzed to determine if any related word senses have symbols. One of the determined symbols may then be associated with the identified text such that the symbol is thereafter displayed in conjunction with or instead of the text in the application.
Owner:DYNAVOX SYST

Joint disambiguation of syntactic and semantic ambiguity

ActiveUS8504355B2Robustly solveSemantic analysisSpeech recognitionSemantic representationSyntactic ambiguity
Ambiguities in a natural language expression are interpreted by jointly disambiguating multiple alternative syntactic and semantic interpretations. More than one syntactic alternative, represented by parse contexts, are analyzed together with joint analysis of referents, word senses, relation types, and layout of a semantic representation for each syntactic alternative. Best combinations of interpretations are selected from all participating parse contexts, and are used to form parse contexts for the next step in parsing.
Owner:POSTQ IPR OY

Forming method for sentence meaning expression machine translation and electronic dictionary

The present invention provides sentence meaning expression creating method and device, machine translation method and system, word and expression interpreting method with meaning range automatically reduced based on the context and electronic dictionary with meaning range automatically reduced based on the context. The method of creating the sentence meaning expression for one sentence includes: editing semanteme unit expressing tree index library based on original language expression of semanteme unit expressing library; extracting semanteme unit expressing trees with each word in the sentence as real beginning based on the semanteme unit expressing tree index library; pruning the extracted semanteme unit expressing trees one word by one word; and finding the sentence meaning expression of the sentence based on the remained semanteme units.
Owner:高庆狮

Bayesian word sense disambiguation method based on synonym expansion

The invention belongs to the technical field of natural language processing methods, and in particular relates to a Bayesian word sense disambiguation method based on synonym expansion. The Bayesian word sense disambiguation method disclosed by the invention is used for mainly solving the problem that the current word sense disambiguation method has the problems of poor disambiguation effect, wasting time and energy to obtain disambiguation knowledge and the like. The Bayesian word sense disambiguation method based on synonym expansion disclosed by the invention comprises the following steps of: (1), expanding the context of a training corpus by adopting the Chinese thesaurus, and generating a lot of pseudo training corpuses; (2), removing noise in the pseudo training corpuses by utilizing a word collocation corpus, and generating a pseudo training corpus; (3), training a Bayesian disambiguation model by adopting the training corpus and the pseudo training corpus simultaneously; and (4), inputting a test corpus into the Bayesian disambiguation model, and co-determining word senses of ambiguous words by comprehensively utilizing the disambiguation knowledge in the two corpuses.
Owner:SHANXI UNIV

Text file word sense disambiguation method and device

The invention discloses a word sense disambiguation method. The text file word sense disambiguation method comprises the steps that multiple reference text contents with determined word senses are configured; at least one text file to be disambiguated is obtained; the text contents are extracted from the text file according to each text file to be disambiguated and are subjected to word segmentation processing so as to obtain a first word set and determine words to be disambiguated in the first word set, at least one reference text contents corresponding to the words to be disambiguated are extracted and are subjected to word segmentation processing so as to obtain at least one second word set, correlation values between the text files and the reference text contents are calculated based on the first word set and the second word sets, and it is determined that the text files are correlated with the reference text contents having the highest correlation values; the text files to be disambiguated are put in a word sense category corresponding to the correlated reference text contents. The invention further discloses a corresponding device. The method and the device can improve the disambiguation efficiency.
Owner:TENCENT TECH (SHENZHEN) CO LTD

A text matching method using a semantic parsing structure

The invention discloses a text matching method using a semantic parsing structure. The method comprises the following steps: defining an initial corpus set Cqa and a supplementary corpus set Cq; defining Semantic structure DP-tree corresponding to text by using a semantic dependency analysis method; Defining a kernel function of the text and a metric function of text similarity based on the semantic structure; Carrying out kernel clustering on the text; obtaining an aggregated text class function(shown in the specification), wherein i = 1, 2, ..., M, q'ij is ni sample points which are selectedfrom each cluster and are closest to the cluster; And through manual audit, approving the Ci class and marking the Ci class with a specific tag Ti. According to the invention, syntactic analysis structures such as a syntactic structure are used as a comparison basis; A convolution kernel function theory and tree kernels (tree kernel, TK) are combined to define a kernel function representing the distance between two tree syntactic structures, and internal and external knowledge of syntactic similarity, word vectors, word sense networks and the like is introduced, so that the similarity betweentexts can be accurately judged.
Owner:ZHONGAN INFORMATION TECH SERVICES CO LTD

Techniques for understanding the aboutness of text based on semantic analysis

In one embodiment of the present invention, a semantic analyzer translates a text segment into a structured representation that conveys the meaning of the text segment. Notably, the semantic analyzer leverages a semantic network to perform word sense disambiguation operations that map text words included in the text segment into concepts—word senses with a single, specific meaning—that are interconnected with relevance ratings. A topic generator then creates topics on-the-fly that includes one or more mapped concepts that are related within the context of the text segment. In this fashion, the topic generator tailors the semantic network to the text segment. A topic analyzer processes this tailored semantic network, generating a relevance-ranked list of topics as a meaningful proxy for the text segment. Advantageously, operating at the level of concepts and topics reduces the misinterpretations attributable to key word and statistical analysis methods.
Owner:KLANGOO INC

Context similarity calculation-based word sense disambiguation method

The invention relates to a context similarity calculation-based word sense disambiguation method. The method comprises the steps of processing training corpora, and training a model by using a part-of-speech tagging version of ukWaC; screening parts of speech, and only reserving notional words including nouns, adjectives, adverbs and verbs; training a bidirectional LSTM model by using the corporasubjected to part-of-speech screening; inputting example sentences of to-be-disambiguated words to the bidirectional LSTM model to obtain context vectors; inputting contexts of the to-be-disambiguatedwords to the bidirectional LSTM model to obtain context vectors of the to-be-disambiguated words; and calculating cosine similarity for the context vectors of the to-be-disambiguated words and the context vectors of the example sentences, and further selecting semanteme of the to-be-disambiguated words by utilizing a k-neighbor method according to an obtained similarity result. According to the method, the semanteme is better modeled; the words and the parts of speech are combined by using an underline behind the words directly; obtained word vectors well distinguish different parts of speechof the same word; and the disambiguation accuracy is improved by 0.5% on an experimental basis of baselines.
Owner:SHENYANG AEROSPACE UNIVERSITY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products