Patents

Literature

Patsnap Eureka AI that helps you search prior art, draft patents, and assess FTO risks, powered by patent and scientific literature data.

47 results about "Concept vector" patented technology

Filter

Efficacy Topic

Property

Owner

Technical Advancement

Application Domain

Technology Topic

Technology Field Word

Patent Country/Region

Patent Type

Patent Status

Application Year

Inventor

Information data retrieval, where the data is organized in terms, documents and document corpora

ActiveUS20050149494A1Easy to solveEasy to FeedbackDigital data processing detailsObject oriented databasesPaper documentDocument preparation

The invention relates to improved solutions for information retrieval, wherein the information is represented by digitized text data. This data is further presumed to be organized in terms (431-438), documents and document corpora, where each document contains at least one term (431-438) and each document corpus contains at least one document. Based on a concept vector (420-424), which conceptually classifies the contents of each document, a term-to-concept vector is generated for each term (431-438) in the document corpus. The term-to-concept vector describes a relationship between the term (431) and each of the concept vectors (420-424). On basis of the term-to-concept vectors for the document corpus, a term-term matrix is generated which describes a term-to-term relationship between all the terms (431-438) in the document corpus. The term-term matrix may then be processed and used for retrieving information from the document corpus, such as the fact that a first term (431) is related to a second term (436).

Information data retrieval, where the data is organized in terms, documents and document corpora

Information data retrieval, where the data is organized in terms, documents and document corpora

Information data retrieval, where the data is organized in terms, documents and document corpora

Owner:ELUCIDON GROUP

Method and system for classifying documents

InactiveUS7849030B2Digital data information retrievalDigital computer detailsContext basedDocumentation

The invention provides a method and system for classifying insurance files for identification, sorting and efficient collection of subrogation claims. The invention determines whether an insurance claim has merit to warrant claim recovery efforts utilizing software code for partially describing a set of documents having unstructured and structured file data containing terms and phrases having contextual bases, code for transforming the terms and phrases, code for iterating a classification process to determine rules that best classify the set of documents based upon context, code for incorporating the rules into an induction and knowledge representation, thesauri taxonomies and text summarization to classify subrogation claims; code for calculating a base score and a concept vector to identify the selected claims that demonstrate a given probability of subrogation recovery.

Method and system for classifying documents

Method and system for classifying documents

Method and system for classifying documents

Owner:HARTFORD FIRE INSURANCE

Method and system for classifying documents

InactiveUS20070282824A1Digital data information retrievalDigital computer detailsDocument preparationContext based

The invention provides a method and system for classifying insurance files for identification, sorting and efficient collection of subrogation claims. The invention determines whether an insurance claim has merit to warrant claim recovery efforts utilizing software code for partially describing a set of documents having unstructured and structured file data containing terms and phrases having contextual bases, code for transforming the terms and phrases, code for iterating a classification process to determine rules that best classify the set of documents based upon context, code for incorporating the rules into an induction and knowledge representation, thesauri taxonomies and text summarization to classify subrogation claims; code for calculating a base score and a concept vector to identify the selected claims that demonstrate a given probability of subrogation recovery.

Method and system for classifying documents

Method and system for classifying documents

Method and system for classifying documents

Owner:HARTFORD FIRE INSURANCE

System and method for analyzing electronic data records

ActiveUS20090083231A1Data processing applicationsDigital data information retrievalDegree of similarityData recording

A system and method for analyzing electronic data records including an annotation unit being operable to receive a set of electronic data records and to compute concept vectors for the set of electronic data records, wherein the coordinates of the concept vectors represent scores of the concepts in the respective electronic data record and wherein the concepts are part of an ontology, a similarity network unit being operable to compute a similarity network by means of the concept vectors and by at least one relationship between the concepts of the ontology, the similarity network representing similarities between the electronic data records, wherein the vertices of the similarity network represent the electronic data records and the edges of the similarity network represent similarity values indicating a degree of similarity between the vertices and steps for executing the system.

System and method for analyzing electronic data records

System and method for analyzing electronic data records

System and method for analyzing electronic data records

Owner:IBM CORP

Multi-subject extracting method based on semantic categories

ActiveCN103970729ASpecial data processing applicationsSubject matterDocument preparation

The invention provides a multi-subject extracting method based on semantic categories. The multi-subject extracting method based on the semantic categories comprises the following steps that firstly, a document is preprocessed according to a traditional method and a vector composed of feature words is obtained preliminarily; secondly, synonyms are merged by the utilization of the corresponding relation between word meanings and concepts of 'HowNet', polysemic word disambiguation is carried out according to the correlation between the semantic categories and the context, and a concept vector model is constructed to represent the document; then the concept vector model is converted to be a semantic category model according to the one-to-one corresponding relation between the concepts and the semantic categories; the concept similarity is calculated by the utilization of the related semantic information in the concepts in 'HowNet' and then the semantic similarity is obtained; the semantic categories are clustered by improving the K-means algorithm according to the method of presetting seeds, and a plurality of subject semantic category clusters are formed; finally, a plurality of sub-subject word sets are obtained in a reverse mode according to the corresponding relations between the semantic categories and the concepts and between the concepts and words. The method considers the semantic information, overcomes the defect that the sensibility to the initial center by the K-means algorithm and time-and-space cost are not stable, and improves the quality of extracted subjects.

Multi-subject extracting method based on semantic categories

Multi-subject extracting method based on semantic categories

Multi-subject extracting method based on semantic categories

Owner:HOHAI UNIV

Multi-document summarization method based on text segmentation

InactiveCN102945228AImprove the efficiency of obtaining informationPracticalSpecial data processing applicationsDocument preparationVector space model

The invention belongs to the technical field of multi-document summarization and provides a multi-document summarization method based on text segmentation, which comprises the following steps of: using HowNet to obtain a concept, building a concept vector space model, conducting text segmentation by adopting an improved DotPlotting model and a sentence concept vector space, calculating sentence weight by using the built concept vector space model, generating a summary according to the sentence weight, the text segmentation and the similarity situation, and evaluating the generated summary by using the ROUGE-N evaluation method and using F_Score as an evaluation index. According to the result, the multi-document summarization by using a text segmentation technique is effective, relevant documents provided by users can be gathered to form a summary by adopting the multi-document summarization method, the summary is displayed to the users in a proper way, the information acquisition efficiency is greatly improved, the practicability is high and the popularization and application values are greater.

Multi-document summarization method based on text segmentation

Multi-document summarization method based on text segmentation

Multi-document summarization method based on text segmentation

Owner:广西超宏科技有限公司

Information data retrieval, where the data is organized in terms, documents and document corpora

ActiveUS7593932B2Easy to solveEasy to FeedbackData processing applicationsDigital data processing detailsFactoidDocument preparation

The invention relates to improved solutions for information retrieval, wherein the information is represented by digitized text data. This data is further presumed to be organized in terms (431-438), documents and document corpora, where each document contains at least one term (431-438) and each document corpus contains at least one document. Based on a concept vector (420-424), which conceptually classifies the contents of each document, a term-to-concept vector is generated for each term (431-438) in the document corpus. The term-to-concept vector describes a relationship between the term (431) and each of the concept vectors (420-424). On basis of the term-to-concept vectors for the document corpus, a term-term matrix is generated which describes a term-to-term relationship between all the terms (431-438) in the document corpus. The term-term matrix may then be processed and used for retrieving information from the document corpus, such as the fact that a first term (431) is related to a second term (436).

Information data retrieval, where the data is organized in terms, documents and document corpora

Information data retrieval, where the data is organized in terms, documents and document corpora

Information data retrieval, where the data is organized in terms, documents and document corpora

Owner:ELUCIDON GROUP

Obtaining and Using a Distributed Representation of Concepts as Vectors

ActiveUS20170032273A1Quality improvementMathematical modelsNatural language data processingConcept vectorDistributed representation

An approach is provided for automatically generating and processing concept vectors by extracting concept sequences from one or more content sources and generating a first concept vector for a first concept by supplying the concept sequences as inputs to a vector learning component, such that the first concept vector comprises information interrelating the first concept to other concepts in the concept sequences which is inferred from the concept sequences.

Obtaining and Using a Distributed Representation of Concepts as Vectors

Obtaining and Using a Distributed Representation of Concepts as Vectors

Obtaining and Using a Distributed Representation of Concepts as Vectors

Owner:IBM CORP

Multi-subject extraction method based on concept vector model

InactiveCN104008090ASpecial data processing applicationsConcept clusterAlgorithm

The invention provides a multi-subject extraction method based on a concept vector model. The method includes the following steps that firstly, a document is preprocessed through a traditional method and then vectors formed by feature words are preliminarily acquired; then synonyms are merged through the corresponding relation between word meanings and concepts in Hownet, disambiguation is conducted on polysemes through correlation between semantic classes and contexts, and the concept vector model is established to represent the document; concept similarity is calculated through related semantic information of the concepts in Hownet, a K-means algorithm is improved through a 'preset seed' method for clustering of the concepts, and then a plurality of subject concept clusters are formed; eventually, according to the corresponding relation between the concepts and words, a plurality of sub subject term sets are acquired. According to the method, semantic information is taken into consideration, the defects of sensitivity of the K-means algorithm to an initial center, space-time cost instability and the like are overcome, and the quality of extracted subjects is improved.

Multi-subject extraction method based on concept vector model

Multi-subject extraction method based on concept vector model

Multi-subject extraction method based on concept vector model

Owner:HOHAI UNIV

Analyzing Concepts Over Time

InactiveUS20170083507A1Quality improvementMarket predictionsRelational databasesComputer scienceConcept vector

A method and apparatus are provided for automatically generating and processing first and second concept vector sets extracted, respectively, from a first set of concept sequences and from a second, temporally separated, concept sequences by performing a natural language processing (NLP) analysis of the first concept vector set and second concept vector set to detect changes in the corpus over time by identifying changes for one or more concepts included in the first and / or second set of concept sequences.

Analyzing Concepts Over Time

Analyzing Concepts Over Time

Analyzing Concepts Over Time

Owner:IBM CORP

Method for extracting multiple subject terms from single Chinese text

InactiveCN103970730ASpecial data processing applicationsConcept clusterSpacetime

The invention provides a method for extracting multiple subject terms from a single Chinese text. The method comprises the steps that firstly, a traditional method is used for preprocessing the text, and vectors composed of feature words are primarily obtained; secondly, merger is performed on synonyms by means of the corresponding relation between the meaning of the words and concepts in the 'HowNet', disambiguation is performed on polysemous words according to the dependency of semantic types and the context, and a concept vector model is built to represent the text; thirdly, the concept similarity of related semantic information is calculated by means of the concepts in the 'HowNet', the K-means algorithm is improved through a 'seed presetting' method to perform clustering on the concepts, and a plurality of subject concept clusters are formed; fourthly, according to the corresponding relation of the concepts and the words, a plurality of sub-subject terms are obtained. According to the method, semantic information is considered, the defect that the K-means algorithm is not stable in sensibility and space overhead of a primary center, and quality of extracted subjects is improved.

Method for extracting multiple subject terms from single Chinese text

Method for extracting multiple subject terms from single Chinese text

Method for extracting multiple subject terms from single Chinese text

Owner:HOHAI UNIV

Cross-language recommendation method and system

InactiveCN106055623AImprove retrieval efficiencyExtended supportNatural language translationSpecial data processing applicationsSearch wordsRecommendation model

The invention discloses a cross-language recommendation method and system. The method comprises following steps: creating and renewing a bilingual search term vector model based on user's search of session logs and mining relevance of bilingual search terms; and creating and renewing a bilingual concept vector model based on a Chinese-English bilingual parallel corpus, creating and renewing a concept word vector model and mining related bilingual concepts. The system comprises a search string pre-processing module used for analyzing serial strings inputted by a user and filtering noise characters, a recommendation word calculation module set up by the bilingual search word vector model and the bilingual concept word vector model and used for searching and calculating similar recommendation words, a long-tail search word processing module used for searching not common low-frequency search words through rewriting of search words and searching of synonyms, and a result output module used for showing recommendation words processed to a user. The cross-language recommendation method and system have following beneficial effects: without on-line artificial translation, search efficiency of the user is increased; through the recommendation method for relevant search words for long-tail search words, recommendation coverage rate is increased; a support range of relevant search words is broadened; by dynamically renewing a mechanism of a recommendation model, the model can timely reflect newest research hotspots and study trends of the search system to which the user pays attention.

Cross-language recommendation method and system

Cross-language recommendation method and system

Cross-language recommendation method and system

Owner:《中国学术期刊(光盘版)》电子杂志社有限公司

Semantic relevancy calculation method of document

ActiveCN105279264ASemantic analysisRelational databasesRelational databaseDocumentation

The invention provides a semantic relevancy calculation method of a document. The semantic relevancy calculation method comprises the following steps: carrying out data preprocessing; establishing mapping from a term to a Wiki concept vector in a relationship database; inputting a first text and a second text, and independently taking the Wiki concept vectors corresponding to all terms in the first text and the second text, wherein the first text and the second text need to carry out semantic relevancy calculation; constructing a hierarchical Wiki catalogue; independently mapping the Wiki concept vectors to the Wiki catalogue to construct Wiki catalogue vectors; and through the Wikipedia catalogue vectors, calculating the semantic relevancy of the first text and the second text. The semantic relevancy calculation method of the document is based on a calculation framework of the semantic relevancy of the Wiki concept and the Wiki catalogue, meanwhile, the semantic relevancies on different abstraction levels are considered and are organically combined to improve the calculation precision of the semantic relevancy, and meanwhile, a favorable man-machine interaction mechanism and a favorable scheduling policy are provided.

Semantic relevancy calculation method of document

Semantic relevancy calculation method of document

Semantic relevancy calculation method of document

Owner:SHENZHEN GIISO INFORMATION TECH

Context-sensitive search using a deep learning model

ActiveCN106415535ANeural architecturesSpecial data processing applicationsSemantic spaceContext based

A search engine is described herein for providing search results based on a context in which a query has been submitted, as expressed by context information. The search engine operates by ranking a plurality of documents based on a consideration of the query, and based, in part, on a context concept vector and a plurality of document concept vectors, both generated using a deep learning model (such as a deep neural network). The context concept vector is formed by a projection of the context information into a semantic space using the deep learning model. Each document concept vector is formed by a projection of document information, associated with a particular document, into the same semantic space using the deep learning model. The ranking operates by favoring documents that are relevant to the context within the semantic space, and disfavoring documents that are not relevant to the context.

Context-sensitive search using a deep learning model

Context-sensitive search using a deep learning model

Context-sensitive search using a deep learning model

Owner:MICROSOFT TECH LICENSING LLC

Wiki semantic matching-based document classification method and system

ActiveCN106372122AImprove performanceImprove production efficiencySpecial data processing applicationsText database clustering/classificationReference spaceClassification methods

The invention discloses a wiki semantic matching-based document classification method and system. The method comprises the following steps of (1) obtaining a keyword set of a text document by utilizing keyword matching for each text document D in a document set, and performing matching in a wiki semantic reference space by utilizing a matching rule to obtain a reference concept set related to the text documents; (2) generating keyword vectors of the text document according to the keyword set of the text document, and generating concept vectors of the text document according to the keyword vectors and the reference concept set of the text document; (3) calculating comprehensive similarity between any two text documents in a plurality of to-be-classified text document sets according to the concept vectors and the keyword vectors; and (4) performing classification according to the comprehensive similarity between the any two text documents. The system comprises a first module, a second module, a third module and a fourth module. According to the method and the system, the contradiction between validity and high efficiency confronted by a wiki semantic matching method is overcome and an efficient online document classification method is provided.

Wiki semantic matching-based document classification method and system

Wiki semantic matching-based document classification method and system

Wiki semantic matching-based document classification method and system

Owner:WENZHOU UNIV OUJIANG COLLEGE

Knowledge graph completion, deduction and storage method and device based on entity concepts

InactiveCN110851613AImprove accuracyEnhance expressive abilitySemantic analysisNeural learning methodsCosine similarityAlgorithm

The invention discloses a knowledge graph completion, deduction and storage method and a device based on entity concepts. The method comprises the steps of determining multiple concept vectors in one-to-one correspondence with multiple concepts of an entity and relationship vectors corresponding to relationships in a knowledge graph; determining an entity vector of the entity according to the plurality of concept vectors of the entity; calculating an unknown vector according to any two known vectors in the head entity vector, the tail entity vector and the relation vector of the unknown triple; and traversing the determined entity vectors or relationship vectors in the knowledge graph, determining the entity vector or relationship vector with the highest cosine similarity with the calculated unknown vector, and speculating the entity or relationship corresponding to the unknown vector so as to complement the knowledge graph. By the adoption of the method and the device, concept information and existing structural knowledge in the knowledge graph are fully fused, concepts and relations are vectorized, and the accuracy and expression capacity of a knowledge graph vectorization modeling result can be effectively improved.

Knowledge graph completion, deduction and storage method and device based on entity concepts

Knowledge graph completion, deduction and storage method and device based on entity concepts

Knowledge graph completion, deduction and storage method and device based on entity concepts

Owner:CHINA ACADEMY OF ELECTRONICS & INFORMATION TECH OF CETC

English word relevancy calculating method and device based on Wikipedia concept vectors

ActiveCN107436955AAccurate expressionAvoid the problem of not being able to accurately distinguish between different conceptsNatural language data processingSpecial data processing applicationsDegree of similarityCartesian product

The invention discloses an English word relevancy calculating method and device based on Wikipedia concept vectors. The method comprises the steps that 1, a Wikipedia Dump service site obtains raw linguistic data and performs standardized processing to generate a Wikipedia basic corpus; 2, concept labeling and extension are performed, and a Wikipedia concept corpus is established; 3, according to the Wikipedia concept corpus, the concept vectors are trained; 4, for word pairs to be compared, a word concept set is obtained according to Wikipedia; 5, the similarity of the concept vector corresponding to each concept in Cartesian products of the concept set is calculated, and a maximum value is taken as the relevancy of the word pairs to be compared. By utilizing the method, word concept information contained by Wikipedia can be fully mined to generate word concept vectors, and word relevancy can be more accurately and effectively calculated.

English word relevancy calculating method and device based on Wikipedia concept vectors

English word relevancy calculating method and device based on Wikipedia concept vectors

English word relevancy calculating method and device based on Wikipedia concept vectors

Owner:QILU UNIV OF TECH

System and method for analyzing electronic data records

ActiveUS8108381B2Digital data information retrievalData processing applicationsDegree of similarityData recording

A system and method for analyzing electronic data records including an annotation unit being operable to receive a set of electronic data records and to compute concept vectors for the set of electronic data records, wherein the coordinates of the concept vectors represent scores of the concepts in the respective electronic data record and wherein the concepts are part of an ontology, a similarity network unit being operable to compute a similarity network by means of the concept vectors and by at least one relationship between the concepts of the ontology, the similarity network representing similarities between the electronic data records, wherein the vertices of the similarity network represent the electronic data records and the edges of the similarity network represent similarity values indicating a degree of similarity between the vertices and steps for executing the system.

System and method for analyzing electronic data records

System and method for analyzing electronic data records

System and method for analyzing electronic data records

Owner:IBM CORP

Method and apparatus for learner diagnosis using reliability of cognitive diagnostic model

InactiveUS20190318650A1Data processing applicationsMachine learningAlgorithmProbit

Provided are an apparatus and method for learner diagnosis using reliability of a cognitive diagnostic model, which estimates the reliability of the cognitive diagnostic model that estimates a concept vector (α) of a learner through a Q-matrix regarding a question and an R-matrix regarding a response to a question, the method including assuming a probability (P(X|α)) of a learner response (X) when a concept vector (α) of a learner is given; obtaining a concept pattern-specific probability (P(α|X)) of the learner from the assumed concept vector and learner response of the learner; obtaining an information entropy (H) value of the learner from the concept pattern-specific probability (P(α|X)) of the learner; and obtaining reliability (γ) of an estimated result of a learner-specific concept understanding using the information entropy value of the learner and a number of concepts.

Method and apparatus for learner diagnosis using reliability of cognitive diagnostic model

Method and apparatus for learner diagnosis using reliability of cognitive diagnostic model

Method and apparatus for learner diagnosis using reliability of cognitive diagnostic model

Owner:ELECTRONICS & TELECOMM RES INST

Method for detecting microblog topics

InactiveCN103810280AAvoid the problem of sparsenessImprove accuracyWeb data indexingSpecial data processing applicationsCluster algorithmMicroblogging

The invention discloses a method for detecting microblog topics. Microblog sets are selected, and the microblog sets are preprocessed through network word network word bank scanning; after the microblog sets are preprocessed, word segmentation, part-of-speech tagging and the like are conducted on the microblog sets to be processed through an ICTCLAS; microblog word concepts are acquired and expanded through an HOWNET tool; the importance degree of the concepts is calculated through TFIDF, a concept vector space model is built for each post, and microblog posts are gathered to form a post matrix model; microblogs are clustered through a clustering algorithm, and the clustered microblog sets are topic sets. Word segmentation, part-of-speech tagging and the like are conducted on the microblog sets to be processed through the ICTCLAS, and therefore topic detection time in the later period is prolonged. The HOWNET is used as the tool, synonyms and word related attributes are used as extension to increase the amount of information, the problem of information sparsity is greatly avoided, and topic detection accuracy in the later period is greatly improved.

Method for detecting microblog topics

Method for detecting microblog topics

Method for detecting microblog topics

Owner:GUANGXI UNIVERSITY OF TECHNOLOGY

Chinese word relevancy calculation method and device based on Wikipedia concept vectors

ActiveCN107491524AAccurate expressionAvoid the problem of not being able to accurately distinguish between different conceptsNatural language data processingSpecial data processing applicationsChinese wordCartesian product

The invention discloses a Chinese word relevancy calculation method and device based on Wikipedia concept vectors. The method comprises the steps that 1, a Wikipedia Dump service site acquires raw corpora, conducts normalized treatment on the raw corpora and generates a Wikipedia basic corpus; 2, concept labeling and expanding are conducted, and a Wikipedia concept corpus is built; 3, according to the Wikipedia concept corpus, the concept vectors are trained; 4, according to the Wikipedia, a word concept set of word pairs to be compared is acquired; 5, the similarities of the concept vectors corresponding to all the concept pairs in the cartesian product of the concept set are calculated, and the largest value is used as the similarity of the word pairs to be compared. By using the Chinese word relevancy calculation method and device based on the Wikipedia concept vectors, word concept information which is contained in the Wikipedia can be sufficiently excavated so that the word concept vectors can be generated, and the word similarity can be calculated more accurately and effectively.

Chinese word relevancy calculation method and device based on Wikipedia concept vectors

Chinese word relevancy calculation method and device based on Wikipedia concept vectors

Chinese word relevancy calculation method and device based on Wikipedia concept vectors

Owner:南方电网互联网服务有限公司

Text coding representation method based on transformer model and multiple reference systems

ActiveCN110399454ASolve the problem of polysemy that is difficult to learnMachine learningText database queryingTransformerEuclidean vector

The invention discloses a text coding representation method based on a transformer model and multiple reference systems, and the method comprises the steps: splicing a word vector and a segmentation character vector of a sentence where the word vector is located based on a word vector and a separator vector coding result of a context text, and obtaining a spliced word vector; mapping the spliced word vector according to at least two set semantic concepts, obtaining at least two semantic concept vectors of the word vector, and, when the absolute semantic concept number of the word vector is smaller than the set semantic concept total number, wherein the semantic concept vectors of the word vector represent convergence, finally leaving p kinds of dissimilar semantic concept vectors; selecting the most suitable semantic concept vector of the word vector in the current context from the dissimilar semantic concept vectors through maximum pooling, and taking the most suitable semantic concept vector as a semantic prediction result of the word vector in the current context; and obtaining a probability vector of the word vector, and determining a word probability under a semantic concept corresponding to the word vector according to the probability vector.

Text coding representation method based on transformer model and multiple reference systems

Text coding representation method based on transformer model and multiple reference systems

Text coding representation method based on transformer model and multiple reference systems

Owner:深思考人工智能机器人科技(北京)有限公司

Analyzing Concepts Over Time

InactiveUS20170286832A1Quality improvementMarket predictionsDigital data information retrievalConcept vectorData science

A method and apparatus are provided for automatically generating and processing first and second concept vector sets extracted, respectively, from a first set of concept sequences and from a second, temporally separated, concept sequences by performing a natural language processing (NLP) analysis of the first concept vector set and second concept vector set to detect changes in the corpus over time by identifying changes for one or more concepts included in the first and / or second set of concept sequences.

Analyzing Concepts Over Time

Analyzing Concepts Over Time

Analyzing Concepts Over Time

Owner:INT BUSINESS MASCH CORP

Document classification method based on hadoop data mining

InactiveCN108268620AEasy to implementImprove classification accuracySpecial data processing applicationsText database clustering/classificationData miningDocumentation

The invention discloses a document classification method based on hadoop data mining. The method comprises following steps: A. preprocessing the data document to determine the keywords and the correspondence between each keyword and the document to which the keyword belongs; B. describing the attribute characteristics of data in a document by means of attribute feature transformation; C. using a matching rule to generate keyword vectors from a keyword set and generating concept vectors according to the keyword vectors and the data attribute characteristics obtained in step B; D. calculating the similarity between any two text documents in the data document to be classified according to the keyword vectors and the concept vectors in step C; E. performing a classification operation based onthe clustering process on the attribute vector, obtaining a classification result of the attribute vector, and the classification result indicating the classification of the target object corresponding to each attribute vector; F. Hadoop automatically collects the above classification results and classifies the classification data documents. The invention has the remarkable advantages of easy implementation and high classification accuracy.

Document classification method based on hadoop data mining

Document classification method based on hadoop data mining

Document classification method based on hadoop data mining

Owner:NANJING UNIV OF POSTS & TELECOMM

Analyzing Concepts Over Time

InactiveUS20170286833A1Quality improvementMarket predictionsDigital data information retrievalConcept vectorComputer science

A method and apparatus are provided for automatically generating and processing first and second concept vector sets extracted, respectively, from a first set of concept sequences and from a second, temporally separated, concept sequences by performing a natural language processing (NLP) analysis of the first concept vector set and second concept vector set to detect changes in the corpus over time by identifying changes for one or more concepts included in the first and / or second set of concept sequences.

Analyzing Concepts Over Time

Analyzing Concepts Over Time

Analyzing Concepts Over Time

Owner:INT BUSINESS MASCH CORP

English concept vector generation method and device based on Wikipedia link structure

ActiveCN108132928AEfficient preprocessingSemantic representation is accurateSemantic analysisCharacter and pattern recognitionInformation repositoryData set

The invention discloses an English concept vector generation method and device based on a Wikipedia link structure. The method comprises the steps that according to a title concept and / or a link concept in an English Wikipedia page, a link information base is established; whether or not the link concept exists in a sample in the link information base is judged, positive training examples and negative training examples are established separately, and a training data set is established by selecting a certain number of positive training examples and negative training examples; a concept vector model is established, wherein the model comprises an input layer, an embedding layer, a concept vector operation layer and an output layer; the concept vector model is trained by adopting the training data set, and a concept vector is extracted from the concept vector model.

English concept vector generation method and device based on Wikipedia link structure

English concept vector generation method and device based on Wikipedia link structure

Owner:SHANDONG NORMAL UNIV

System for predictive analysis of time series data flows

InactiveUS7299214B2Easy to predictAvoid poor resultsFinanceDigital computer detailsDigital dataData stream

This system for predictive analysis of digital data time flows associated with text information (18) comprises means (12) for automatically predicting values (24) of a digital data time flow from past values (22) of said flow. It further comprises means (14) for analyzing text information (18) to supply the prediction means (12) with a weighted concept vector (20) associated with the past values (22) of the digital data time flow.

System for predictive analysis of time series data flows

System for predictive analysis of time series data flows

Owner:FRANCE TELECOM SA

Personal big data management hierarchy concept vectorizing incrementation processing method

ActiveCN106682129AEffective regulationGuaranteed vectorization effectSpecial data processing applicationsNODALRelationship - Father

Provided is a personal big data management hierarchy concept vectorizing incrementation processing method. The method comprises the following steps that 1, when a system run for the first time, all concepts are vectorized, and all branching nodes are subjected to concept vector merging operation; 2, when a user operates a concept tree, the substeps of 2.1 obtaining concept vectors and total word number of operated nodes and father nodes thereof, 2.2 modifying the concept vectors of the father nodes according to a formula, 2.3 conducting recursive implementation from the substep 2.1 by taking the father nodes as the operated nodes till a root node and 2.4 updating an inverse document frequency vector are executed; 3, when errors are accumulated to a certain degree, the substeps of 3.1 obtaining current inverse document frequency vector and an inverse document frequency initial value vector, 3.2 updating all vector weights in a vector space in a batched mode and 3.3 updating the inverse document frequency initial value vector are executed. According to the method, the personal big data management hierarchy concept vectorizing incrementation calculation method is achieved, the concept vectors in the concept space can be rapidly adjusted, and the execution efficiency is improved.

Personal big data management hierarchy concept vectorizing incrementation processing method

Personal big data management hierarchy concept vectorizing incrementation processing method

Personal big data management hierarchy concept vectorizing incrementation processing method

Owner:ZHEJIANG UNIV OF TECH

A Multi-topic Extraction Method Based on Semantic Classes

ActiveCN103970729BSpecial data processing applicationsSubject matterDegree of similarity

The invention provides a multi-subject extracting method based on semantic categories. The multi-subject extracting method based on the semantic categories comprises the following steps that firstly, a document is preprocessed according to a traditional method and a vector composed of feature words is obtained preliminarily; secondly, synonyms are merged by the utilization of the corresponding relation between word meanings and concepts of 'HowNet', polysemic word disambiguation is carried out according to the correlation between the semantic categories and the context, and a concept vector model is constructed to represent the document; then the concept vector model is converted to be a semantic category model according to the one-to-one corresponding relation between the concepts and the semantic categories; the concept similarity is calculated by the utilization of the related semantic information in the concepts in 'HowNet' and then the semantic similarity is obtained; the semantic categories are clustered by improving the K-means algorithm according to the method of presetting seeds, and a plurality of subject semantic category clusters are formed; finally, a plurality of sub-subject word sets are obtained in a reverse mode according to the corresponding relations between the semantic categories and the concepts and between the concepts and words. The method considers the semantic information, overcomes the defect that the sensibility to the initial center by the K-means algorithm and time-and-space cost are not stable, and improves the quality of extracted subjects.

A Multi-topic Extraction Method Based on Semantic Classes

A Multi-topic Extraction Method Based on Semantic Classes

A Multi-topic Extraction Method Based on Semantic Classes

Owner:HOHAI UNIV

WEB resource-based ontology concept hierarchy acquisition method, system and storage medium

PendingCN112364175AImprove accuracySpecial data processing applicationsSemantic tool creationQuery stringEngineering

The invention provides a WEB resource-based ontology concept hierarchy acquisition method, which comprises the following steps of: constructing a query string containing hierarchical relationships byutilizing clue words, and acquiring corpora rich in the hierarchical relationships from Web by virtue of a search engine; a concept vector space model is constructed by comprehensively utilizing relation enrichment corpora, encyclopedia knowledge interpretation entries and news documents obtained from Web, and a concept graph is established by fusing concept semantic similarity based on a'known network '; and after pruning operation is performed on the concept map, an improved hierarchical tree construction algorithm is utilized to obtain a clear hierarchical affiliation relationship between concepts. The accuracy of the hierarchical affiliation obtained by the scheme of the invention is obviously superior to that of the prior art, and a solid foundation is laid for realizing semantic information interaction between human and machines.

WEB resource-based ontology concept hierarchy acquisition method, system and storage medium

WEB resource-based ontology concept hierarchy acquisition method, system and storage medium

WEB resource-based ontology concept hierarchy acquisition method, system and storage medium

Owner:CAPITAL NORMAL UNIVERSITY

Popular searches

Information data Text corpus Knowledge representation and reasoning Software Phrase Knowledge level Data pack Electronic data Annotation Conceptual model