Patents
Literature
Hiro is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Hiro

47 results about "Concept vector" patented technology

Information data retrieval, where the data is organized in terms, documents and document corpora

The invention relates to improved solutions for information retrieval, wherein the information is represented by digitized text data. This data is further presumed to be organized in terms (431-438), documents and document corpora, where each document contains at least one term (431-438) and each document corpus contains at least one document. Based on a concept vector (420-424), which conceptually classifies the contents of each document, a term-to-concept vector is generated for each term (431-438) in the document corpus. The term-to-concept vector describes a relationship between the term (431) and each of the concept vectors (420-424). On basis of the term-to-concept vectors for the document corpus, a term-term matrix is generated which describes a term-to-term relationship between all the terms (431-438) in the document corpus. The term-term matrix may then be processed and used for retrieving information from the document corpus, such as the fact that a first term (431) is related to a second term (436).
Owner:ELUCIDON GROUP

Method and system for classifying documents

The invention provides a method and system for classifying insurance files for identification, sorting and efficient collection of subrogation claims. The invention determines whether an insurance claim has merit to warrant claim recovery efforts utilizing software code for partially describing a set of documents having unstructured and structured file data containing terms and phrases having contextual bases, code for transforming the terms and phrases, code for iterating a classification process to determine rules that best classify the set of documents based upon context, code for incorporating the rules into an induction and knowledge representation, thesauri taxonomies and text summarization to classify subrogation claims; code for calculating a base score and a concept vector to identify the selected claims that demonstrate a given probability of subrogation recovery.
Owner:HARTFORD FIRE INSURANCE

System and method for analyzing electronic data records

A system and method for analyzing electronic data records including an annotation unit being operable to receive a set of electronic data records and to compute concept vectors for the set of electronic data records, wherein the coordinates of the concept vectors represent scores of the concepts in the respective electronic data record and wherein the concepts are part of an ontology, a similarity network unit being operable to compute a similarity network by means of the concept vectors and by at least one relationship between the concepts of the ontology, the similarity network representing similarities between the electronic data records, wherein the vertices of the similarity network represent the electronic data records and the edges of the similarity network represent similarity values indicating a degree of similarity between the vertices and steps for executing the system.
Owner:IBM CORP

Multi-subject extracting method based on semantic categories

The invention provides a multi-subject extracting method based on semantic categories. The multi-subject extracting method based on the semantic categories comprises the following steps that firstly, a document is preprocessed according to a traditional method and a vector composed of feature words is obtained preliminarily; secondly, synonyms are merged by the utilization of the corresponding relation between word meanings and concepts of 'HowNet', polysemic word disambiguation is carried out according to the correlation between the semantic categories and the context, and a concept vector model is constructed to represent the document; then the concept vector model is converted to be a semantic category model according to the one-to-one corresponding relation between the concepts and the semantic categories; the concept similarity is calculated by the utilization of the related semantic information in the concepts in 'HowNet' and then the semantic similarity is obtained; the semantic categories are clustered by improving the K-means algorithm according to the method of presetting seeds, and a plurality of subject semantic category clusters are formed; finally, a plurality of sub-subject word sets are obtained in a reverse mode according to the corresponding relations between the semantic categories and the concepts and between the concepts and words. The method considers the semantic information, overcomes the defect that the sensibility to the initial center by the K-means algorithm and time-and-space cost are not stable, and improves the quality of extracted subjects.
Owner:HOHAI UNIV

Multi-document summarization method based on text segmentation

The invention belongs to the technical field of multi-document summarization and provides a multi-document summarization method based on text segmentation, which comprises the following steps of: using HowNet to obtain a concept, building a concept vector space model, conducting text segmentation by adopting an improved DotPlotting model and a sentence concept vector space, calculating sentence weight by using the built concept vector space model, generating a summary according to the sentence weight, the text segmentation and the similarity situation, and evaluating the generated summary by using the ROUGE-N evaluation method and using F_Score as an evaluation index. According to the result, the multi-document summarization by using a text segmentation technique is effective, relevant documents provided by users can be gathered to form a summary by adopting the multi-document summarization method, the summary is displayed to the users in a proper way, the information acquisition efficiency is greatly improved, the practicability is high and the popularization and application values are greater.
Owner:广西超宏科技有限公司

Obtaining and Using a Distributed Representation of Concepts as Vectors

An approach is provided for automatically generating and processing concept vectors by extracting concept sequences from one or more content sources and generating a first concept vector for a first concept by supplying the concept sequences as inputs to a vector learning component, such that the first concept vector comprises information interrelating the first concept to other concepts in the concept sequences which is inferred from the concept sequences.
Owner:IBM CORP

Multi-subject extraction method based on concept vector model

InactiveCN104008090ASpecial data processing applicationsConcept clusterAlgorithm
The invention provides a multi-subject extraction method based on a concept vector model. The method includes the following steps that firstly, a document is preprocessed through a traditional method and then vectors formed by feature words are preliminarily acquired; then synonyms are merged through the corresponding relation between word meanings and concepts in Hownet, disambiguation is conducted on polysemes through correlation between semantic classes and contexts, and the concept vector model is established to represent the document; concept similarity is calculated through related semantic information of the concepts in Hownet, a K-means algorithm is improved through a 'preset seed' method for clustering of the concepts, and then a plurality of subject concept clusters are formed; eventually, according to the corresponding relation between the concepts and words, a plurality of sub subject term sets are acquired. According to the method, semantic information is taken into consideration, the defects of sensitivity of the K-means algorithm to an initial center, space-time cost instability and the like are overcome, and the quality of extracted subjects is improved.
Owner:HOHAI UNIV

Analyzing Concepts Over Time

A method and apparatus are provided for automatically generating and processing first and second concept vector sets extracted, respectively, from a first set of concept sequences and from a second, temporally separated, concept sequences by performing a natural language processing (NLP) analysis of the first concept vector set and second concept vector set to detect changes in the corpus over time by identifying changes for one or more concepts included in the first and / or second set of concept sequences.
Owner:IBM CORP

Method for extracting multiple subject terms from single Chinese text

InactiveCN103970730ASpecial data processing applicationsConcept clusterSpacetime
The invention provides a method for extracting multiple subject terms from a single Chinese text. The method comprises the steps that firstly, a traditional method is used for preprocessing the text, and vectors composed of feature words are primarily obtained; secondly, merger is performed on synonyms by means of the corresponding relation between the meaning of the words and concepts in the 'HowNet', disambiguation is performed on polysemous words according to the dependency of semantic types and the context, and a concept vector model is built to represent the text; thirdly, the concept similarity of related semantic information is calculated by means of the concepts in the 'HowNet', the K-means algorithm is improved through a 'seed presetting' method to perform clustering on the concepts, and a plurality of subject concept clusters are formed; fourthly, according to the corresponding relation of the concepts and the words, a plurality of sub-subject terms are obtained. According to the method, semantic information is considered, the defect that the K-means algorithm is not stable in sensibility and space overhead of a primary center, and quality of extracted subjects is improved.
Owner:HOHAI UNIV

Cross-language recommendation method and system

The invention discloses a cross-language recommendation method and system. The method comprises following steps: creating and renewing a bilingual search term vector model based on user's search of session logs and mining relevance of bilingual search terms; and creating and renewing a bilingual concept vector model based on a Chinese-English bilingual parallel corpus, creating and renewing a concept word vector model and mining related bilingual concepts. The system comprises a search string pre-processing module used for analyzing serial strings inputted by a user and filtering noise characters, a recommendation word calculation module set up by the bilingual search word vector model and the bilingual concept word vector model and used for searching and calculating similar recommendation words, a long-tail search word processing module used for searching not common low-frequency search words through rewriting of search words and searching of synonyms, and a result output module used for showing recommendation words processed to a user. The cross-language recommendation method and system have following beneficial effects: without on-line artificial translation, search efficiency of the user is increased; through the recommendation method for relevant search words for long-tail search words, recommendation coverage rate is increased; a support range of relevant search words is broadened; by dynamically renewing a mechanism of a recommendation model, the model can timely reflect newest research hotspots and study trends of the search system to which the user pays attention.
Owner:《中国学术期刊(光盘版)》电子杂志社有限公司

Semantic relevancy calculation method of document

The invention provides a semantic relevancy calculation method of a document. The semantic relevancy calculation method comprises the following steps: carrying out data preprocessing; establishing mapping from a term to a Wiki concept vector in a relationship database; inputting a first text and a second text, and independently taking the Wiki concept vectors corresponding to all terms in the first text and the second text, wherein the first text and the second text need to carry out semantic relevancy calculation; constructing a hierarchical Wiki catalogue; independently mapping the Wiki concept vectors to the Wiki catalogue to construct Wiki catalogue vectors; and through the Wikipedia catalogue vectors, calculating the semantic relevancy of the first text and the second text. The semantic relevancy calculation method of the document is based on a calculation framework of the semantic relevancy of the Wiki concept and the Wiki catalogue, meanwhile, the semantic relevancies on different abstraction levels are considered and are organically combined to improve the calculation precision of the semantic relevancy, and meanwhile, a favorable man-machine interaction mechanism and a favorable scheduling policy are provided.
Owner:SHENZHEN GIISO INFORMATION TECH

Context-sensitive search using a deep learning model

A search engine is described herein for providing search results based on a context in which a query has been submitted, as expressed by context information. The search engine operates by ranking a plurality of documents based on a consideration of the query, and based, in part, on a context concept vector and a plurality of document concept vectors, both generated using a deep learning model (such as a deep neural network). The context concept vector is formed by a projection of the context information into a semantic space using the deep learning model. Each document concept vector is formed by a projection of document information, associated with a particular document, into the same semantic space using the deep learning model. The ranking operates by favoring documents that are relevant to the context within the semantic space, and disfavoring documents that are not relevant to the context.
Owner:MICROSOFT TECH LICENSING LLC

Wiki semantic matching-based document classification method and system

The invention discloses a wiki semantic matching-based document classification method and system. The method comprises the following steps of (1) obtaining a keyword set of a text document by utilizing keyword matching for each text document D in a document set, and performing matching in a wiki semantic reference space by utilizing a matching rule to obtain a reference concept set related to the text documents; (2) generating keyword vectors of the text document according to the keyword set of the text document, and generating concept vectors of the text document according to the keyword vectors and the reference concept set of the text document; (3) calculating comprehensive similarity between any two text documents in a plurality of to-be-classified text document sets according to the concept vectors and the keyword vectors; and (4) performing classification according to the comprehensive similarity between the any two text documents. The system comprises a first module, a second module, a third module and a fourth module. According to the method and the system, the contradiction between validity and high efficiency confronted by a wiki semantic matching method is overcome and an efficient online document classification method is provided.
Owner:WENZHOU UNIV OUJIANG COLLEGE

Knowledge graph completion, deduction and storage method and device based on entity concepts

The invention discloses a knowledge graph completion, deduction and storage method and a device based on entity concepts. The method comprises the steps of determining multiple concept vectors in one-to-one correspondence with multiple concepts of an entity and relationship vectors corresponding to relationships in a knowledge graph; determining an entity vector of the entity according to the plurality of concept vectors of the entity; calculating an unknown vector according to any two known vectors in the head entity vector, the tail entity vector and the relation vector of the unknown triple; and traversing the determined entity vectors or relationship vectors in the knowledge graph, determining the entity vector or relationship vector with the highest cosine similarity with the calculated unknown vector, and speculating the entity or relationship corresponding to the unknown vector so as to complement the knowledge graph. By the adoption of the method and the device, concept information and existing structural knowledge in the knowledge graph are fully fused, concepts and relations are vectorized, and the accuracy and expression capacity of a knowledge graph vectorization modeling result can be effectively improved.
Owner:CHINA ACADEMY OF ELECTRONICS & INFORMATION TECH OF CETC

English word relevancy calculating method and device based on Wikipedia concept vectors

ActiveCN107436955AAccurate expressionAvoid the problem of not being able to accurately distinguish between different conceptsNatural language data processingSpecial data processing applicationsDegree of similarityCartesian product
The invention discloses an English word relevancy calculating method and device based on Wikipedia concept vectors. The method comprises the steps that 1, a Wikipedia Dump service site obtains raw linguistic data and performs standardized processing to generate a Wikipedia basic corpus; 2, concept labeling and extension are performed, and a Wikipedia concept corpus is established; 3, according to the Wikipedia concept corpus, the concept vectors are trained; 4, for word pairs to be compared, a word concept set is obtained according to Wikipedia; 5, the similarity of the concept vector corresponding to each concept in Cartesian products of the concept set is calculated, and a maximum value is taken as the relevancy of the word pairs to be compared. By utilizing the method, word concept information contained by Wikipedia can be fully mined to generate word concept vectors, and word relevancy can be more accurately and effectively calculated.
Owner:QILU UNIV OF TECH

System and method for analyzing electronic data records

A system and method for analyzing electronic data records including an annotation unit being operable to receive a set of electronic data records and to compute concept vectors for the set of electronic data records, wherein the coordinates of the concept vectors represent scores of the concepts in the respective electronic data record and wherein the concepts are part of an ontology, a similarity network unit being operable to compute a similarity network by means of the concept vectors and by at least one relationship between the concepts of the ontology, the similarity network representing similarities between the electronic data records, wherein the vertices of the similarity network represent the electronic data records and the edges of the similarity network represent similarity values indicating a degree of similarity between the vertices and steps for executing the system.
Owner:IBM CORP

Method and apparatus for learner diagnosis using reliability of cognitive diagnostic model

Provided are an apparatus and method for learner diagnosis using reliability of a cognitive diagnostic model, which estimates the reliability of the cognitive diagnostic model that estimates a concept vector (α) of a learner through a Q-matrix regarding a question and an R-matrix regarding a response to a question, the method including assuming a probability (P(X|α)) of a learner response (X) when a concept vector (α) of a learner is given; obtaining a concept pattern-specific probability (P(α|X)) of the learner from the assumed concept vector and learner response of the learner; obtaining an information entropy (H) value of the learner from the concept pattern-specific probability (P(α|X)) of the learner; and obtaining reliability (γ) of an estimated result of a learner-specific concept understanding using the information entropy value of the learner and a number of concepts.
Owner:ELECTRONICS & TELECOMM RES INST

Method for detecting microblog topics

The invention discloses a method for detecting microblog topics. Microblog sets are selected, and the microblog sets are preprocessed through network word network word bank scanning; after the microblog sets are preprocessed, word segmentation, part-of-speech tagging and the like are conducted on the microblog sets to be processed through an ICTCLAS; microblog word concepts are acquired and expanded through an HOWNET tool; the importance degree of the concepts is calculated through TFIDF, a concept vector space model is built for each post, and microblog posts are gathered to form a post matrix model; microblogs are clustered through a clustering algorithm, and the clustered microblog sets are topic sets. Word segmentation, part-of-speech tagging and the like are conducted on the microblog sets to be processed through the ICTCLAS, and therefore topic detection time in the later period is prolonged. The HOWNET is used as the tool, synonyms and word related attributes are used as extension to increase the amount of information, the problem of information sparsity is greatly avoided, and topic detection accuracy in the later period is greatly improved.
Owner:GUANGXI UNIVERSITY OF TECHNOLOGY

Chinese word relevancy calculation method and device based on Wikipedia concept vectors

ActiveCN107491524AAccurate expressionAvoid the problem of not being able to accurately distinguish between different conceptsNatural language data processingSpecial data processing applicationsChinese wordCartesian product
The invention discloses a Chinese word relevancy calculation method and device based on Wikipedia concept vectors. The method comprises the steps that 1, a Wikipedia Dump service site acquires raw corpora, conducts normalized treatment on the raw corpora and generates a Wikipedia basic corpus; 2, concept labeling and expanding are conducted, and a Wikipedia concept corpus is built; 3, according to the Wikipedia concept corpus, the concept vectors are trained; 4, according to the Wikipedia, a word concept set of word pairs to be compared is acquired; 5, the similarities of the concept vectors corresponding to all the concept pairs in the cartesian product of the concept set are calculated, and the largest value is used as the similarity of the word pairs to be compared. By using the Chinese word relevancy calculation method and device based on the Wikipedia concept vectors, word concept information which is contained in the Wikipedia can be sufficiently excavated so that the word concept vectors can be generated, and the word similarity can be calculated more accurately and effectively.
Owner:南方电网互联网服务有限公司

Text coding representation method based on transformer model and multiple reference systems

ActiveCN110399454ASolve the problem of polysemy that is difficult to learnMachine learningText database queryingTransformerEuclidean vector
The invention discloses a text coding representation method based on a transformer model and multiple reference systems, and the method comprises the steps: splicing a word vector and a segmentation character vector of a sentence where the word vector is located based on a word vector and a separator vector coding result of a context text, and obtaining a spliced word vector; mapping the spliced word vector according to at least two set semantic concepts, obtaining at least two semantic concept vectors of the word vector, and, when the absolute semantic concept number of the word vector is smaller than the set semantic concept total number, wherein the semantic concept vectors of the word vector represent convergence, finally leaving p kinds of dissimilar semantic concept vectors; selecting the most suitable semantic concept vector of the word vector in the current context from the dissimilar semantic concept vectors through maximum pooling, and taking the most suitable semantic concept vector as a semantic prediction result of the word vector in the current context; and obtaining a probability vector of the word vector, and determining a word probability under a semantic concept corresponding to the word vector according to the probability vector.
Owner:深思考人工智能机器人科技(北京)有限公司

Document classification method based on hadoop data mining

The invention discloses a document classification method based on hadoop data mining. The method comprises following steps: A. preprocessing the data document to determine the keywords and the correspondence between each keyword and the document to which the keyword belongs; B. describing the attribute characteristics of data in a document by means of attribute feature transformation; C. using a matching rule to generate keyword vectors from a keyword set and generating concept vectors according to the keyword vectors and the data attribute characteristics obtained in step B; D. calculating the similarity between any two text documents in the data document to be classified according to the keyword vectors and the concept vectors in step C; E. performing a classification operation based onthe clustering process on the attribute vector, obtaining a classification result of the attribute vector, and the classification result indicating the classification of the target object corresponding to each attribute vector; F. Hadoop automatically collects the above classification results and classifies the classification data documents. The invention has the remarkable advantages of easy implementation and high classification accuracy.
Owner:NANJING UNIV OF POSTS & TELECOMM

English concept vector generation method and device based on Wikipedia link structure

The invention discloses an English concept vector generation method and device based on a Wikipedia link structure. The method comprises the steps that according to a title concept and / or a link concept in an English Wikipedia page, a link information base is established; whether or not the link concept exists in a sample in the link information base is judged, positive training examples and negative training examples are established separately, and a training data set is established by selecting a certain number of positive training examples and negative training examples; a concept vector model is established, wherein the model comprises an input layer, an embedding layer, a concept vector operation layer and an output layer; the concept vector model is trained by adopting the training data set, and a concept vector is extracted from the concept vector model.
Owner:SHANDONG NORMAL UNIV

System for predictive analysis of time series data flows

This system for predictive analysis of digital data time flows associated with text information (18) comprises means (12) for automatically predicting values (24) of a digital data time flow from past values (22) of said flow. It further comprises means (14) for analyzing text information (18) to supply the prediction means (12) with a weighted concept vector (20) associated with the past values (22) of the digital data time flow.
Owner:FRANCE TELECOM SA

Personal big data management hierarchy concept vectorizing incrementation processing method

Provided is a personal big data management hierarchy concept vectorizing incrementation processing method. The method comprises the following steps that 1, when a system run for the first time, all concepts are vectorized, and all branching nodes are subjected to concept vector merging operation; 2, when a user operates a concept tree, the substeps of 2.1 obtaining concept vectors and total word number of operated nodes and father nodes thereof, 2.2 modifying the concept vectors of the father nodes according to a formula, 2.3 conducting recursive implementation from the substep 2.1 by taking the father nodes as the operated nodes till a root node and 2.4 updating an inverse document frequency vector are executed; 3, when errors are accumulated to a certain degree, the substeps of 3.1 obtaining current inverse document frequency vector and an inverse document frequency initial value vector, 3.2 updating all vector weights in a vector space in a batched mode and 3.3 updating the inverse document frequency initial value vector are executed. According to the method, the personal big data management hierarchy concept vectorizing incrementation calculation method is achieved, the concept vectors in the concept space can be rapidly adjusted, and the execution efficiency is improved.
Owner:ZHEJIANG UNIV OF TECH

A Multi-topic Extraction Method Based on Semantic Classes

The invention provides a multi-subject extracting method based on semantic categories. The multi-subject extracting method based on the semantic categories comprises the following steps that firstly, a document is preprocessed according to a traditional method and a vector composed of feature words is obtained preliminarily; secondly, synonyms are merged by the utilization of the corresponding relation between word meanings and concepts of 'HowNet', polysemic word disambiguation is carried out according to the correlation between the semantic categories and the context, and a concept vector model is constructed to represent the document; then the concept vector model is converted to be a semantic category model according to the one-to-one corresponding relation between the concepts and the semantic categories; the concept similarity is calculated by the utilization of the related semantic information in the concepts in 'HowNet' and then the semantic similarity is obtained; the semantic categories are clustered by improving the K-means algorithm according to the method of presetting seeds, and a plurality of subject semantic category clusters are formed; finally, a plurality of sub-subject word sets are obtained in a reverse mode according to the corresponding relations between the semantic categories and the concepts and between the concepts and words. The method considers the semantic information, overcomes the defect that the sensibility to the initial center by the K-means algorithm and time-and-space cost are not stable, and improves the quality of extracted subjects.
Owner:HOHAI UNIV

WEB resource-based ontology concept hierarchy acquisition method, system and storage medium

The invention provides a WEB resource-based ontology concept hierarchy acquisition method, which comprises the following steps of: constructing a query string containing hierarchical relationships byutilizing clue words, and acquiring corpora rich in the hierarchical relationships from Web by virtue of a search engine; a concept vector space model is constructed by comprehensively utilizing relation enrichment corpora, encyclopedia knowledge interpretation entries and news documents obtained from Web, and a concept graph is established by fusing concept semantic similarity based on a'known network '; and after pruning operation is performed on the concept map, an improved hierarchical tree construction algorithm is utilized to obtain a clear hierarchical affiliation relationship between concepts. The accuracy of the hierarchical affiliation obtained by the scheme of the invention is obviously superior to that of the prior art, and a solid foundation is laid for realizing semantic information interaction between human and machines.
Owner:CAPITAL NORMAL UNIVERSITY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products