Patents
Literature
Hiro is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Hiro

52 results about "Contextual similarity" patented technology

Contextual Word Similarity is nothing but identifying different types of similarities between words. It is one of the goals of Natural Language Processing. Statistical approaches are used for computing the degree of similarity between words.

Domain entity disambiguation method for fusing word vectors and topic model

The invention relates to a domain entity disambiguation method for fusing word vectors and a topic model, and belongs to the technical field of natural language processing and deep learning. The method comprises the steps of obtaining candidate entity sets of to-be-disambiguated entities; obtaining vector forms of the to-be-disambiguated entities and candidate entities, obtaining categorical referents of the to-be-disambiguated entities in combination with a hyponymy relation domain knowledge base, and performing context similarity and categorical referent similarity calculation; performing word vector training on documents in different topic classifications by utilizing the LDA topic model and a Skip-gram word vector model, obtaining word vector representations of different meanings of apolysemous word, extracting a topic domain keyword of a text by using a K-Means algorithm, and performing domain topic keyword similarity calculation; and finally, fusing three feature similarities, and taking the candidate entity with the highest similarity as a final target entity. The method is superior to a conventional disambiguation method and can well meet the demands of actual applications.
Owner:KUNMING UNIV OF SCI & TECH

System and method for performing a similarity measure of anonymized data

A similarity measure system selects a first value and a first context related to the first value, divides the first value into a first set of substrings in an order preserving way, and processes each of these substrings through an obfuscation function to produce a first set of obfuscated substrings. The system selects a second value and a second context related to the second value, and processes the second value to produce a second set of obfuscated substrings. The system calculates a context similarity measure for the first context and the second context. The system determines a value similarity measure from the first and second set of order preserved obfuscated substrings. The system determines a closeness degree between the first value and the second value and a closeness degree based on the context similarity measure.
Owner:IBM CORP

Method and system for linking entities

The invention discloses a method and a system for linking entities. The method includes acquiring to-be-linked entities from given texts; acquiring entity names and abbreviation word banks from preset knowledge bases and establishing synonym banks of the entity names on the basis of the preset knowledge bases; carrying out searching in the synonym banks by the aid of entity keywords; linking the entity keywords for searching and the entity names in the preset knowledge bases if a certain entry matched with the synonym banks is found by means of searching; generating candidate entities if the certain entry is not found by means of matching and carrying out disambiguation linking in context similarity evaluation modes. The synonym banks contain the entity names acquired from the preset knowledge bases and information data related to the entity names. The entity keywords are acquired by means of word segmentation and are used as search terms. The entity names in the knowledge bases correspond to the entry. The method and the system in an embodiment of the invention have the advantage that the entity linking accuracy can be improved.
Owner:南京柯基数据科技有限公司

Clique based clustering for named entity recognition system

A soft clustering method comprises (i) grouping items into non-exclusive cliques based on features associated with the items, and (ii) clustering the non-exclusive cliques using a hard clustering algorithm to generate item groups on the basis of mutual similarity of the features of the items constituting the cliques. In some named entity recognition embodiments illustrated herein as examples, named entities together with contexts are grouped into cliques based on mutual context similarity. Each clique includes a plurality of different named entities having mutual context similarity. The cliques are clustered to generate named entity groups on the basis of mutual similarity of the contexts of the named entities constituting the cliques.
Owner:XEROX CORP

Clique based clustering for named entity recognition system

A soft clustering method comprises (i) grouping items into non-exclusive cliques based on features associated with the items, and (ii) clustering the non-exclusive cliques using a hard clustering algorithm to generate item groups on the basis of mutual similarity of the features of the items constituting the cliques. In some named entity recognition embodiments illustrated herein as examples, named entities together with contexts are grouped into cliques based on mutual context similarity. Each clique includes a plurality of different named entities having mutual context similarity. The cliques are clustered to generate named entity groups on the basis of mutual similarity of the contexts of the named entities constituting the cliques.
Owner:XEROX CORP

Click rate estimating method and device and electronic equipment

ActiveCN106372249ASolve the problem of inaccurate click-through rateAccurate click rateAdvertisementsCharacter and pattern recognitionContextual similarityComputer science
The invention provides a click rate estimating method, and belongs to the technical field of computers. The click rate estimating method comprises the following steps: setting a click label for an exposure log according to a click log; based on the click label of the exposure log and the contextual similarity of a page element, setting an exposure weight corresponding to the exposure log; according to the exposure log set with the exposure weight, performing click rate estimation. By the click rate estimating method, the problem of low accuracy caused by no consideration on difference between exposure effectiveness of the page element in different contexts during the click rate estimation in the prior art is solved. According to the scheme disclosed by the invention, by setting the exposure weight corresponding to the exposure log based on the click label of the exposure log and the recorded contextual similarity of the page element, and then by introducing the exposure weight during the click rate estimation, the estimated click rate is more accurate.
Owner:BEIJING SANKUAI ONLINE TECH CO LTD

Document classification method and document classification device

The invention discloses a document classification method and a document classification device. The document classification method includes the steps: extracting feature text of a target document and utilizing the feature text to form search conditions; searching by utilizing the search conditions to acquire a relevant search result; calculating text similarity of the target document and the search result; acquiring a classification result of the target document according to the acquired text similarity by calculating and classification information of the research result. Based on the similarity among the texts and by utilizing existed document classification information to perform classification on the new document, the classification result high in confidence coefficient can be acquired through statistical computation upon text classification similar to current text content due to the fact that documents similar in the text content are high in probability of belonging to the same classification.
Owner:BEIJING BAIDU NETCOM SCI & TECH CO LTD

Named entity identification method and device

The invention provides a named entity identification method and device. After first entity probability distribution of a train document and second entity probability distribution of a test document are obtained by utilizing an initially constructed first sequence annotation model, features, such as the first context similarity of the train document and the first object similarity of the train document and the second context similarity of the test document and the second object similarity of the test document, can be extracted from social network information; therefore, a second sequence annotation model is obtained by training the first context similarity of the train document and the first object similarity of the train document, such that the second sequence annotation model is more suitable for a social network; and in addition, the named entity identification result, which is obtained by performing sequence annotation of the test document based on the second sequence annotation model suitable for the social network, is more accurate.
Owner:CHINA CONSTRUCTION BANK

System and method for performing a similarity measure of anonymized data

A similarity measure system selects a first value and a first context related to the first value, divides the first value into a first set of substrings in an order preserving way, and processes each of these substrings through an obfuscation function to produce a first set of obfuscated substrings. The system selects a second value and a second context related to the second value, and processes the second value to produce a second set of obfuscated substrings. The system calculates a context similarity measure for the first context and the second context. The system determines a value similarity measure from the first and second set of order preserved obfuscated substrings. The system determines a closeness degree between the first value and the second value and a closeness degree based on the context similarity measure.
Owner:IBM CORP

Rearrangement method and system based on document similarity

The invention discloses a rearrangement method and a rearrangement system based on document similarity, and relates to the field of calculation and detection on text similarity. The method comprises the following steps of extracting documents to be compared, and generating plain texts; normalizing the plain texts, and generating normalized text unit; encoding the normalized text unit, and generating an irreversible representative code with a fixed length by an encoding algorithm; extracting keywords of the representative codes of the files to be compared, and generating a keyword sequence; calculating the word form similarity and the word sequence similarity of sentences to be compared according to the keyword sequence of the sentences to be compared; calculating the similarity of the sentences to be compared according to the word form similarity and the word sequence similarity of the sentences to be compared; calculating the similarity of the documents to be compared according to the similarity of the sentences. The rearrangement method and the rearrangement system can be suitable for Chinese characters, are convenient for use by users in China, and are also higher in similar document comparison precision.
Owner:HUAZHONG UNIV OF SCI & TECH

Acquisition method and acquisition system for similarity of vocabularies between different languages

The invention discloses an acquisition method and an acquisition system for similarity of vocabularies between different languages, which are capable of acquiring similarity of the vocabularies between different languages according to context vocabulary similarity and dependence similarity of the vocabularies in the source language and the target language. Since the context vocabulary similarity and the dependence similarity are simultaneously used to evaluate the similarity of the vocabularies between different languages, reliability of the similarity can be effectively enhanced, and translation accuracy can be effectively improved.
Owner:SUZHOU UNIV

Text similarity calculation method and device and storage medium

A text similarity calculation method is applied to the technical field of computer application and comprises the following steps of respectively performing word segmentation processing on two to-be-processed texts to obtain two first vocabulary sets, and calculating first similarity of the two to-be-processed texts based on the two first vocabulary sets; inputting the two texts into a preset N-gram language model to obtain two second vocabulary sets, and calculating second similarity of the two texts based on the two second vocabulary sets; and calculating the similarity of the two texts basedon the first similarity and the second similarity according to a preset adjustment parameter of the first similarity and a preset adjustment parameter of the second similarity. The invention furtherprovides a text similarity calculation device and a storage medium. In the process, when the text similarity is calculated, the similarity between text semantics is considered, and the similarity of text words is also considered, so that the calculation of the text similarity is more accurate.
Owner:武汉瓯越网视有限公司

Code fragment recommendation method considering code statement sequence information

The invention discloses a code fragment recommendation method considering code statement sequence information, which comprises the following steps: obtaining a current code context, carrying out formatting, structure information and declarated variable type extraction, and converting a cleaned code fragment into an LC sequence; calculating BWT similarity and variable type similarity of the code segments in the code database and the current code context, and obtaining a plurality of code segments with the highest similarity with the current code context as candidate sets; and reordering the candidate code segments according to the BWT similarity, variable similarity and structural similarity of the code segments in the candidate set and the current code context, and presenting a reordered list to the user. Compared with the prior art, the sequence information between the code statements is used, so that the similarity between the recommended code snippets and the query is higher, and auser can better use the recommended code snippets. And moreover, structural information is introduced in the sorting process, so that a better sorting effect can be achieved.
Owner:NANJING UNIV OF AERONAUTICS & ASTRONAUTICS

Method and apparatus for selecting content segments

A method, apparatus and computer program product select content segments based at least in part on a contextual similarity level between the content segments and at least one of brightness levels, blur levels, and shake levels of the content segments. As such, a resultant video may be produced that comprises selected content segments. Accordingly, brightness levels of the content may be improved and shake and blur levels reduced while maintaining a desired field of view.
Owner:RPX CORP

Knowledge base fusion method for encyclopedia websites

The invention provides a knowledge base fusion method for encyclopedia websites. The knowledge base fusion method is used for fusing knowledge cards (intebox) of Baidu encyclopedia, interactive encyclopedia and Chinese Wikipedia with the largest influence at present. The method comprises the following steps: step 1, acquiring a query result of an encyclopedia website about the same entity and preprocessing the query result; step 2, establishing a mapping relationship for entities in the encyclopedia website by integrating concept similarity, attribute similarity and context similarity characteristics; step 3, performing attribute alignment on the knowledge cards of the entities with the established mapping relationship by means of an external dictionary; step 4, designing a single truth value discovery scheme and a multi-truth value discovery scheme for attributes with conflict attribute values according to whether the attribute values are single-value type and multi-value type; and step 5, outputting the fused attribute-attribute value pair. And finally redundant high-reliability entity-related attribute-attribute value pairs of the three encyclopedia knowledge cards are obtained.
Owner:HOHAI UNIV

Entity linking method based on multi-domain entity indexes

ActiveCN106934020AFix linking issuesFacilitate the development of subsequent applications (such as natural language question answering)Special data processing applicationsEntity linkingData mining
The present invention discloses an entity linking method based on multi-domain entity indexes. The method comprises two main steps: (1) establishing multi-domain indexes for the entities in the knowledge base; and (2) screening candidate entities based on the multi-domain indexes, reordering candidate entities by using context similarity scores and popularity scores, and linking an entity reference to the entity with the highest score. According to the method disclosed by the present invention, there is no need to search the candidate entities based on the alias dictionary, indexes for the subdomains with different attributes (relations)of entities in the knowledge base are established, and the candidate entities which are matched with the entity reference are obtained by searching the name domain; and for the preliminarily screened candidate entities, the context scores and the popularity scores of the candidate entities are calculated by using the information of other domain indexes, the candidate entities are reordered, and the entity reference is linked to the entity with the highest score.
Owner:SOUTHEAST UNIV

Method for solving text similarity based on Gini index

The invention discloses a method for solving a text similarity based on the Gini index. The method comprises the following steps: performing text word segmentation processing by use of the word segmentation technology, matching with a stop word list to perform a stop word elimination operation on a vocabulary, and obtaining a series of vocabulary positions and word characteristic weighted values according to the research statistics; collecting and reducing dimensions of the text vocabulary by use of a target weight function as shown in description, combining the vocabularies with high similarity according to the semantic similarity, collecting and reducing the dimensions of above characteristic words again, and solving the inter-textual similarity by use of the similarity between the vectors. Compared with the traditional text characteristic vocabulary extracting method, the method disclosed by the invention is higher in accuracy, better in application vale, and good in data processing effect; the defects of an information gain method are overcome, the result is more suitable for the experience value, the text characteristic vocabulary high-dimensional spare problem and the problem of the synonyms and polyseme are solved, the contribute degrees of different vocabularies to the text thought are computed, and the good theory basis is provided for the subsequent text similarity and text clustering.
Owner:SICHUAN YONGLIAN INFORMATION TECH CO LTD

Entity relation extracting method and device for text processing

The invention discloses an entity relation extracting method and device for text processing. The method comprises the steps of inputting a to-be-processed text; identifying entities in the to-be-processed text, wherein the to-be-processed text comprises multiple entities; screening the entities according to preset examples to obtain context features of input instances; calculating context similarity between the input instances and seed examples in a seed example library through the context features; judging whether the context similarity is greater than a first preset threshold value or not; if the similarity is greater than the first preset threshold value, performing statistics on the number of the seed examples with the similarity greater than the preset threshold value; judging whetherthe number of the seed examples with the similarity greater than the preset threshold value is greater than a second preset threshold value or not; and if the number of the seed examples with the similarity greater than the preset threshold value is greater than the second preset threshold value, taking the input instances as entity relation instances obtained by the text processing. According tothe entity relation extracting method and device, the technical problems of high accuracy and low recall of a rule method are solved.
Owner:DATAGRAND TECH INC

Prediction System for Geographical Locations of Users Based on Social and Spatial Proximity, and Related Method

Determining a location of a user on a social network platform is difficult due to incorrect information or lack of information associated with the user. A system and method are provided to compute contextual similarity. This includes, for example, computing content similarity between seed users and followers / friends, as well as computing an engagement score between seed users and followers / friends. The system also computes geo-social-spatial similarity. The similarity scores are used in any inference computation to infer the geo-locations of the followers of the seed users, and subject users who share common friends with the seed users. The user geo-location inference database is updated using the result. Other seed users are selected, and the process is repeated.
Owner:MELTWATER NEWS INT HLDG +1

Text similarity ordering method based on ES search

The invention belongs to the technical field of big data and discloses a text similarity ordering method based on ES search. According to the method, the position order of text words is considered inan algorithm to calculate similarity degrees among texts, therefore, the problem that texts with identical words and non-identical serial numbers cannot be ordered through ES search is solved, and theaccuracy of ES text similarity ordering is improved. The method comprises the steps that a, ES preliminary search is performed to obtain a similar text set; b, text word segmentation is performed toobtain a segmented word set; c, on the basis of the segmented word set, vectorized expression is performed on the texts after word segmentation; d, the similarity degrees among text vectors are measured through cosine similarities; and e, similarity reordering is performed on the texts according to cosine similarity values.
Owner:SICHUAN CHANGHONG ELECTRIC CO LTD

Knowledge distillation-based unsupervised industrial image anomaly detection method and system

PendingCN114240892AImprove the automation level of quality inspectionSolve the problem of insufficient cold startImage enhancementImage analysisImaging processingAnomaly detection
The invention discloses an unsupervised industrial image anomaly detection method and system based on knowledge distillation, and belongs to the technical field of industrial image processing. Comprising a training stage and a testing stage, and is composed of multi-scale knowledge distillation and multi-scale anomaly fusion, the multi-scale knowledge distillation comprises a teacher network and a student network, hard case samples are dynamically mined by using adaptive hard case mining, and the student network is optimized by using pixels among the hard case samples and context similarity. In the training stage, knowledge distillation from a teacher network to a student network is carried out only by using a normal industrial image, and iterative optimization is carried out on student network parameters, so that the normal industrial product image depth features extracted by the student network and the teacher network are similar; in a test stage, depth features of a test image are respectively extracted, and regression errors between the features can be used for image anomaly segmentation and detection. According to the method, the performance of unsupervised industrial image anomaly detection is effectively improved, the labor cost is reduced, and the automation and intelligence level of production line quality inspection is improved.
Owner:HUAZHONG UNIV OF SCI & TECH

Service regression detection using real-time anomaly detection of log data

The present system provides continuous delivery and service regression detection in real time based on log data. The log data is clustered based on textual and contextual similarity and can serve as an indicator for the behavior of a service or application. The clusters can be augmented with the frequency distribution of its occurrences bucketed at a temporal level. Collectively, the textual and contextual similarity clusters serve as a strong signature (e.g., learned representation) of the current service date and a strong indicator for predicting future behavior. Machine learning techniques are used to generate a signature from log data to represent the current state and predict the future behavior of the service at any instant in time.
Owner:HARNESS INC

Graph model filtering method fusing shallow semantic information

The invention provides a graph model filtering method fusing shallow semantic information, and the method comprises the steps: inputting a Chinese reference into a reference extension method, and obtaining an accurate and complete entity reference; putting the entity reference as a key field of wiki search into a Chinese Wikipedia knowledge base to obtain a candidate entity list of the entity reference; inputting the candidate entity list into a graph model filtering method fusing shallow semantic information to obtain a filtered candidate entity list; and storing the filtered candidate entitylist into a database to prepare for an entity disambiguation module. According to the method, the text similarity is obtained by fusing the shallow semantic information to calculate the context similarity between the candidate entity and the entity reference, and is used as the weight factor of the filtering algorithm; and candidate entity relevancy is calculated based on a graph model in-out degree algorithm to serve as a weight factor of a filtering algorithm, and finally, two weight factors are fused to obtain a comprehensive score to arrange candidate entities, so that entity disambiguation errors are reduced.
Owner:UNIV OF ELECTRONICS SCI & TECH OF CHINA

Demand entity co-reference detection method and device based on deep learning and context semantics

ActiveCN111950281ASolve the entity coreference problemAvoid starvation of resourcesNatural language data processingSemanticsEngineering
The invention discloses a demand entity co-reference detection method and device based on deep learning and context semantics. The method comprises the following steps: 1) context interception: firstly positioning an entity, then intercepting a text according to a window size by taking the entity as a center, and taking a demand text as a context related to the entity; 2) constructing a context similarity network: the network is composed of two parts, one part is a fine tuning BERT model used for learning context representation, and the other part is a Word2Vec-based network used for learningentity representation; respectively inputting the context and the entity into a BERT model and a Word2Vec network, and connecting the two obtained vector representations; and finally, deducing a prediction label by using a multi-layer sensor and a softmax layer, i.e., judging whether the two entities are co-reference entities or not. According to the method, the entity co-reference problem in natural language requirements can be solved, and consensus of entities among stakeholders in multiple different fields is facilitated.
Owner:INST OF SOFTWARE - CHINESE ACAD OF SCI

Acquisition method and acquisition system for similarity of vocabularies between different languages

The invention discloses an acquisition method and an acquisition system for similarity of vocabularies between different languages, which are capable of acquiring similarity of the vocabularies between different languages according to context vocabulary similarity and dependence similarity of the vocabularies in the source language and the target language. Since the context vocabulary similarity and the dependence similarity are simultaneously used to evaluate the similarity of the vocabularies between different languages, reliability of the similarity can be effectively enhanced, and translation accuracy can be effectively improved.
Owner:SUZHOU UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products